Search code examples
pythonamazon-web-servicesaws-lambda

Implications of querying database before lambda_handler


Here is the pseudo-code for what current lambda function looks like;

import pandas
import pymysql


def get_db_data(con_):
    query = "SELECT * FROM mytable"
    data = pandas.read_sql(query, con_)
    return data


def lambda_handler(event, context):
    con = pymysql.connect()
    data = get_db_data(con)
    """
    do other things with event
    """
    con.close()

I am debating if I can do this instead:

import pandas
import pymysql

con = pymysql.connect()


def get_db_data(con_):
    query = "SELECT * FROM mytable"
    data = pandas.read_sql(query, con_)
    return data


data = get_db_data(con)


def lambda_handler(event, context):
    """
    do other things with event
    """
    con.close()

But I am not sure if it is a good practice. What implications would the second option have on run-time and cost? Is it against the recommended way?


Solution

  • When working with a database connection in a Lambda function, it is best to follow AWS best practices and use INIT code (which is where you are almost heading) to load expensive resources.

    Take advantage of execution environment reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations processed by the same instance of your function can reuse these resources. This saves cost by reducing function run time.

    Lambda can run from either a COLD or WARM start. On COLD start, the code outside the lambda handler is executed. When a Lambda is run from a WARM start, the resources loading during COLD start will be available. By including resources like database connection opening in the COLD start, subsequent WARM starts will not have to re-execute the same expensive operation. Getting to reuse the WARM start requires that calls to the specific Lambda be within a short period of time. This can greatly reduce the execution time on your Lambda functions and this reduce costs!

    Based on where you were going, I would say to rewrite it as such:

    import pandas
    import pymysql
    
    con = pymysql.connect()
     
    def get_db_data(con_):
        query = "SELECT * FROM mytable"
        data = pandas.read_sql(query, con_)
        return data
        
    def lambda_handler(event, context):
        data = get_db_data(con)
        """
        do other things with event
        """
        con.close()
    

    This concept is also explained well in the AWS Lambda docs here.

    Also, consider what the duration of the connection is in its settings (client and server). If the connection closes before the Lambda's instance goes away - you might want to include an isOpen check prior to using the connection and providing logic necessary to reopen it.