Search code examples
pythonpython-3.xcsvetldata-manipulation

How to format query results as CSV?


My goal: Automate the operation of executing a query and output the results into a csv.

I have been successful in obtaining the query results using Python (this is my first project ever in Python). I am trying to format these results as a csv but am completely lost. It's basically just creating 2 massive rows with all the data not parsed out. The .txt and .csv results are attached (I obtained these by simply calling the query and entering "file name > results.txt" or "file name > results.csv".

txt results: {'data': {'get_result': {'job_id': None, 'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', '__typename': 'get_result_response'}}} {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', 'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c', 'error': None, 'runtime': 157, 'generated_at': '2022-04-07T20:14:36.693419+00:00', 'columns': ['project_name', 'leaderboard_date', 'volume_30day', 'transactions_30day', 'floor_price', 'median_price', 'unique_holders', 'rank', 'custom_sort_order'], '__typename': 'query_results'}], 'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA', 'floor_price': 0.375, 'leaderboard_date': '2022-04-07', 'median_price': 343.4, 'project_name': 'Terraforms by Mathcastles', 'rank': 1, 'transactions_30day': 2774, 'unique_holders': 2179, 'volume_30day': 744611.6252}, '__typename': 'get_result_template'}, {'data': {'custom_sort_order': 'AB', 'floor_price': 4.69471, 'leaderboard_date': '2022-04-07', 'median_price': 6.5, 'project_name': 'Meebits', 'rank': 2, 'transactions_30day': 4153, 'unique_holders': 6200, 'volume_30day': 163520.7377371168}, '__typename': 'get_result_template'}, etc. (repeats for 100s of rows)..


Solution

  • Your results text string actually contains two dictionaries separated by a space character.
    Here's a formatted version of what's in each of them:

    dict1 = {'data': {'get_result': {'job_id': None,
                                     'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
                                     '__typename': 'get_result_response'}}}
    
    dict2 = {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
                                         'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c',
                                         'error': None,
                                         'runtime': 157,
                                         'generated_at': '2022-04-07T20:14:36.693419+00:00',
                                         'columns': ['project_name',
                                                     'leaderboard_date',
                                                     'volume_30day',
                                                     'transactions_30day',
                                                     'floor_price',
                                                     'median_price',
                                                     'unique_holders',
                                                     'rank',
                                                     'custom_sort_order'],
                                         '__typename': 'query_results'}],
                      'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA',
                                                            'floor_price': 0.375,
                                                            'leaderboard_date': '2022-04-07',
                                                            'median_price': 343.4,
                                                            'project_name': 'Terraforms by Mathcastles',
                                                            'rank': 1,
                                                            'transactions_30day': 2774,
                                                            'unique_holders': 2179,
                                                            'volume_30day': 744611.6252},
                                                   '__typename': 'get_result_template'},
                                                  {'data': {'custom_sort_order': 'AB',
                                                            'floor_price': 4.69471,
                                                            'leaderboard_date': '2022-04-07',
                                                            'median_price': 6.5,
                                                            'project_name': 'Meebits',
                                                            'rank': 2,
                                                            'transactions_30day': 4153,
                                                            'unique_holders': 6200,
                                                            'volume_30day': 163520.7377371168},
                                                   '__typename': 'get_result_template'},
                                                 ]}}
    

    (BTW I formatting them using the pprint module. This is often a good first step when dealing with these kinds of problems — so you know what you're dealing with.)

    Ignoring the first one completely and all but the repetitive data in the second — which is what I assume is all you really want — you could create a CSV file from the nested dictionary values in the dict2['data']['get_result_by_result_id'] list. Here's how that could be done using the csv.DictWriter class:

    import csv
    from pprint import pprint  # If needed.
    
    
    output_filepath = 'query_results.csv'
    
    # Determine CSV fieldnames based on keys of first dictionary.
    fieldnames = dict2['data']['get_result_by_result_id'][0]['data'].keys()
    
    with open(output_filepath, 'w', newline='') as outp:
        writer = csv.DictWriter(outp, delimiter=',', fieldnames=fieldnames)
        writer.writeheader()  # Optional.
    
        for result in dict2['data']['get_result_by_result_id']:
    #        pprint(result['data'], sort_dicts=False)
            writer.writerow(result['data'])
    
    print('fini')
    

    Using the test data, here's the contents of the 'query_results.csv' file it created:

    custom_sort_order,floor_price,leaderboard_date,median_price,project_name,rank,transactions_30day,unique_holders,volume_30day
    AA,0.375,2022-04-07,343.4,Terraforms by Mathcastles,1,2774,2179,744611.6252
    AB,4.69471,2022-04-07,6.5,Meebits,2,4153,6200,163520.7377371168