Search code examples
pythonamazon-web-servicesboto3amazon-athena

python athena calculate total number of output rows


I need to get total number of output rows returned by Athena.

    status = 'RUNNING'
    while status in ['QUEUED', 'RUNNING']:
        response_get_query_details = athena.get_query_execution(
            QueryExecutionId=query_execution_id
        )

        status = (
            response_get_query_details.get("QueryExecution", {})
            .get("Status", {})
            .get("State", "NA")
        )

        if status in ("FAILED", "CANCELLED", "NA"):
            raise Exception(f"Athena Query Failed: {failure_reason}")
        elif status == 'SUCCEEDED':
            query_stats = response_get_query_details['QueryExecution']['Statistics']
            total_rows = query_stats['OutputRows'] <<--- `outputRows` is not available
            return total_rows

There's only DataScannedInBytes in the statistics.

{'EngineExecutionTimeInMillis': 9799, 'DataScannedInBytes': 1090182, 'TotalExecutionTimeInMillis': 9991, 'QueryQueueTimeInMillis': 164, 'QueryPlanningTimeInMillis': 8860, 'ServiceProcessingTimeInMillis': 28}

Is there a way to calculate total number of rows from this?


Solution

  • The get_query_runtime_statistics() API call returns the number of rows returned by the query:

    {
        'QueryRuntimeStatistics': {
            'Timeline': {
                'QueryQueueTimeInMillis': 123,
                'QueryPlanningTimeInMillis': 123,
                'EngineExecutionTimeInMillis': 123,
                'ServiceProcessingTimeInMillis': 123,
                'TotalExecutionTimeInMillis': 123
            },
            'Rows': {
                'InputRows': 123,
                'InputBytes': 123,
                'OutputBytes': 123,
                'OutputRows': 123   <-- Here!
            },
            ...
        }
    }