I need to get total number of output rows returned by Athena.
status = 'RUNNING'
while status in ['QUEUED', 'RUNNING']:
response_get_query_details = athena.get_query_execution(
QueryExecutionId=query_execution_id
)
status = (
response_get_query_details.get("QueryExecution", {})
.get("Status", {})
.get("State", "NA")
)
if status in ("FAILED", "CANCELLED", "NA"):
raise Exception(f"Athena Query Failed: {failure_reason}")
elif status == 'SUCCEEDED':
query_stats = response_get_query_details['QueryExecution']['Statistics']
total_rows = query_stats['OutputRows'] <<--- `outputRows` is not available
return total_rows
There's only DataScannedInBytes
in the statistics.
{'EngineExecutionTimeInMillis': 9799, 'DataScannedInBytes': 1090182, 'TotalExecutionTimeInMillis': 9991, 'QueryQueueTimeInMillis': 164, 'QueryPlanningTimeInMillis': 8860, 'ServiceProcessingTimeInMillis': 28}
Is there a way to calculate total number of rows from this?
The get_query_runtime_statistics()
API call returns the number of rows returned by the query:
{
'QueryRuntimeStatistics': {
'Timeline': {
'QueryQueueTimeInMillis': 123,
'QueryPlanningTimeInMillis': 123,
'EngineExecutionTimeInMillis': 123,
'ServiceProcessingTimeInMillis': 123,
'TotalExecutionTimeInMillis': 123
},
'Rows': {
'InputRows': 123,
'InputBytes': 123,
'OutputBytes': 123,
'OutputRows': 123 <-- Here!
},
...
}
}