I'm using Pyathena to run basic queries:
from pyathena import connect as pyathena_connect #to distinguish from other connect methods
import pandas as pd
class AthenaDataConnection():
def __init__(self, S3_STAGING_DIR, SEP=';', REGION='us-east-1', ACCESS_KEY=None, S_KEY=None):
self.S3_STAGING_DIR = S3_STAGING_DIR
self.REGION = REGION
self.SEP = SEP
if ACCESS_KEY and S_KEY:
self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION,
aws_access_key_id=ACCESS_KEY, aws_secret_access_key=S_KEY)
else:
self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION)
def get_athena_data(self, sql_dict):
print(f"Athena connection established; starting to query data using pd-sql integration")
sql_results = {}
for filename, sql in sql_dict.items():
try:
load_data = pd.read_sql(sql,self.athena_conn)
print(f"{filename} data fetched from Athena but not saved (returned in dict only).")
sql_results[filename] = load_data
except:
print(f"Reading {filename} failed")
return sql_results
athena = AthenaDataConnection('s3://athena-staging/',ACCESS_KEY=ACCESS_KEY, S_KEY=S_KEY)
sql_dict = {'foobar':"select * from foo.bar where foo='bar'"}
df_dict = athena.get_athena_data(sql_dict)
df = df_dict.get('foobar')
#assume this is the end of the script; i.e., I did NOT save the query results myself
So when the query executes, a file then appears in the staging folder, e.g.:
s3://athena-staging/abc123_45678_91011.csv
I'd like my code to be able to capture that filename and save it for other purposes. But how? I can't find anything in the Pyathena docs. UPDATE - I've just learned that the filename is the Query ID + .csv! So I am now looking for a way to get the Athena query ID.
OK, once I learned that the filename isn't random, but rather is Athena's query ID, I was able to do a better search and find a solution. Using the object I've already created above:
cursor = athena.athena_conn.cursor()
cursor.execute(sql)
cursor.query_id
returns the query_id, which is the filename without .csv at the end. Now I can get the file as needed.