Search code examples
amazon-web-servicesboto3amazon-rds

RDS-data query to pandas dataframe


I have a serverless Aurora DB on AWS RDS (with data api enabled) that I would like to query using the database resource and secret ARNs. The code to do this is shown below.

rds_data = boto3.client('rds-data', region_name='us-west-1')

response = rds_data.execute_statement(
            resourceArn='DATABASE_ARN',
            secretArn='DATABASE_SECRET',
            database='db',
            sql=query)

rds_data.close()

The response contains the returned records, but the column names are not shown and the format is interesting (see below).

'records': [[{'stringValue': 'name'}, {'stringValue': '5'}, {'stringValue': '3'}, {'stringValue': '1'}, {'stringValue': '2'}, {'stringValue': '4'}]]

I want to query the aurora database and then create a pandas dataframe. Is there a better way to do this? I looked at just using psycopg2 and creating a database connection, but I believe you need a user name and password. I would like to not go that route and instead authenticate using IAM.


Solution

  • The answer was actually pretty simple. There is a parameter called "formatRecordsAs" in the "execute_statement" function. If you set this to json, you can get back better records.

    response = rds_data.execute_statement(
                resourceArn='DATABASE_ARN',
                secretArn='DATABASE_SECRET',
                database='db',
                formatRecordsAs='JSON',
                sql=query)
    

    This still gives you back a string, so then you just need to convert that string representation to a list of dictionaries.

    list_of_dicts = ast.literal_eval(response['formattedRecords'])
    

    That can then be changed to a pandas dataframe

    df = pd.DataFrame(list_of_dicts)