ShavingTheYak

DevOps, SRE, and the Kitchen Sink

Creating a Lambda CronJob, from a Custom-Built Python Docker Container

January 3, 2024

Posted by Sysadmin

So I wanted to make a cronjob in lambda…

I found myself needing to make a cronjob using AWS Lambda. It was fairly complex, used python and JS (both!) inside of one container and needed to be useable anywhere, both from my local machine or inside of Lambda. I knew there were a few different methods to achieve this, but since I hadn’t really planned it out I started hacking away at a python script for days, just running it locally. Eventually I got to the point where I had something that ‘worked’, where I had an entrypoint.sh that would run my python script. The script created an output file which was then altered using a JS library inside of the shell before being copied to S3.

I created a Github workflow to build the container image, push it to ECR, and then set the Lambda image to match. Fingers crossed, I hit the ‘test’ button on the lambda. It seemed to run, and I was getting some log output (thanks to running it unbuffered, via ‘python -u main.py’), but it would never actually run successfully. I could get the whole thing to run, and do what it needed to S3 via the IAM role attached to the lambda, but I didn’t know how lambda worked under the hood to determine whether the function ran OK or not. I wasn’t going to go log digging in Cloudwatch every day to make sure things were running, so I needed to fix this. After quite a bit of messing around I discovered a few items:

Lambda expects your code to run in a loop, requesting the trigger event over and over. In the case of a Cron, this would just be the eventbridge rule triggering it to run once, every so often.
When your code requests a trigger event and there is none as it has been processed already (in the case of a one time run from a cron), then lambda itself will kill the running process off and mark it as ‘ok’.
You need to actually get the trigger request ‘id’ from lambda in your code and then return it at the end of the program so it knows you are done with that request. Then you attempt to get the next request which will be empty.
If your code exits anywhere before it requests an empty trigger event, it will show up as an error. This is convenient as I had a number of places where the code would exit out on an exception to what it was doing.
When you are testing your lambda and it is failing to follow the rules but still running, you will get log output, but once it works within the expectations of lambda you won’t see any ‘print’ statements or log output from stdout/stderr unless you run the python unbuffered (python -u myfile.py)

So lets start with something silly:

import time

def main():
    print("starting AWS lambda run")
    time.sleep(60)
    print("ending AWS lambda run")

if __name__ == "__main__":
    main()

So the starting point here is a python script that does nothing but print a few lines and wait. Obviously you can do all sorts of stuff in between the print statements, and run it locally until it ‘works’ for you before moving it from your development machine to lambda.

Now lets add in the bare minimum here, what we need to get this to run in lambda. We will also add an argument so we can still run things ‘locally’…

import argparse
import time

parser = argparse.ArgumentParser(description="lambda_cron")
parser.add_argument("--local_run", action="store_true", help="local run, not in lambda")
args = parser.parse_args()
LOCAL_RUN = args.local_run

def get_lambda_request_id():
	# here we need logic to get the request IOD from lambda somehow
    return request_id

def return_to_lambda(request_id):
	# here we need to return something to the lambda
    return

def main():
    if not LOCAL_RUN:
	print("starting AWS lambda run")
        request_id = get_lambda_request_id()
		
    time.sleep(60)
	# Do all your stuff here, calling other functions and whatnot

    if not LOCAL_RUN:
	print("ending AWS lambda run")
        return_to_lambda(request_id)
        get_lambda_request_id()	
    

if __name__ == "__main__":
    main()

So here what we have added is a way to specify a local_run. If we don’t specify this then the process goes this way:

Get the request ID from lambda for the currently triggered run
Do ‘stuff’. If something goes sideways, print an error and exit in your code (not shown here, obv)
Get to the ‘end’ of your script – if you got this far then everything should have run OK which means you need to return that to lambda for the current request ID, and then try to get another request ID, so lambda will see you requesting a new trigger when none exists, and properly terminate.

Lets go ahead and flesh in the remaining code. Lambda has an API endpoint it maintains inside of its environment, which your running container will be able to hit. It stores the url for this locally inside the environment, and we can use this then to make requests for what we need, both from lambda and back to it. Requests of course is a pip module so you will need to make sure it is also built into your container image.

import argparse
import time
import os
import requests

AWS_LAMBDA_RUNTIME_API = os.getenv('AWS_LAMBDA_RUNTIME_API')

parser = argparse.ArgumentParser(description="lambda_cron")
parser.add_argument("--local_run", action="store_true", help="local run, not in lambda")
args = parser.parse_args()
LOCAL_RUN = args.local_run

def get_lambda_request_id():
    try:
        r = requests.get(f'http://{AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next')
        request_id = r.headers["Lambda-Runtime-Aws-Request-Id"]
        print(f"Lambda request ID is {request_id}")
    except Exception as err:
        print(str(err))
    return request_id

def return_to_lambda(request_id):
    response = {"result":"OK"}
    try:
        r = requests.post(f'http://{AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/{request_id}/response', data=json.dumps(response))
        print(str(r))
        print(r.text)
    except Exception as err:
        print(str(err))
    return

def main():
    if not LOCAL_RUN:
	print("starting AWS lambda run")
        request_id = get_lambda_request_id()
		
    time.sleep(60)
	# Do all your stuff here, calling other functions and whatnot

    if not LOCAL_RUN:
	print("ending AWS lambda run")
        return_to_lambda(request_id)
        get_lambda_request_id()	
    

if __name__ == "__main__":
    main()

So you can see here that we now get the lambda API from a pre set environment variable ‘AWS_LAMBDA_RUNTIME_API’ that AWS adds to the container automatically. We then use requests to grab the event id for the current request. You can also see that there is space here to return a non-fixed response in the return_to_lambda function, if we wanted to do so. In our case since this is a cron, we assume we got this far and must be ‘ok’ so we return something generic. You could return basically anything you wanted there though.

So with the all of the above in place, when you run your container and the code gets all the way to the end of its run, it returns correctly to AWS lambda it’s response and then lambda terminates the process in the correct way, giving you a good run on the monitoring tab. One thing to keep in mind here though is that the way this is setup you need to exit/fail out of the program if anything doesn’t work right. Otherwise you end up with a ‘good’ lambda run on a cron rule but it can still not work right. Depending on your process, you may want to have a function near the end of the lambda to re-check everything worked OK, or just make sure than any failure earlier in the code causes the script to exit prematurely so lambda ‘fails’ it. Either way, with the unbuffered option and some print statements or logging module outputs you should get logs in cloudwatch (assuming the lambda has the correct rights to do so, of course).

Updated: January 3, 2024 at 12:19 pm