Search code examples
pythonjsoncsvgoogle-trends

How to convert specific CSV format to JSON using Python


I have downloaded a CSV file from Google Trends which presents data in this format:

Top cities for golden globes
City,golden globes
New York (United States),100
Los Angeles (United States),91
Toronto (Canada),69

Top regions for golden globes
Region,golden globes
United States,100
Canada,91
Ireland,72
Australia,72

There are 3-4 of these groups separated by whitespace. The first line of each group contains text I want to use as a key, followed by a list of dictionaries I need associated with that key. Does anyone have any advice on some Python tools I could use to make this happen? I'm not having much luck with Python's CSV library.

My desired output from the above CSV would look like this:

{
"Top cities for golden globes" :
   {
      "New York (United States)" : 100,
      "Los Angeles (United States)" : 91,
      "Toronto (Canada)" : 69
   },
"Top regions for golden globes" :
   {
      "United States" : 100,
      "Canada" : 91,
      "Ireland" : 72,
      "Australia" : 72
   }
}

Solution

  • Your input format is so expectable that I would do it by hand, without a CSV library.

    import json
    from collections import defaultdict
    
    fh = open("yourfile.csv")
    result = defaultdict(dict) #dictionary holding the data
    current_key = "" #current category
    ignore_next = False #flag to skip header
    
    for line in fh:
        line = line.strip() #throw away newline
        if line == "": #line is empty
            current_key = ""
            continue
        if current_key == "": #current_key is empty
            current_key = line #so the current line is the header for the following data
            ignore_next = True
            continue
        if ignore_next: #we're in a line that can be ignored
            ignore_next = False
            continue
        (a,b) = line.split(",")
        result[current_key][a] = b
    fh.close()
    
    #pretty-print data
    print json.dumps(result, sort_keys=True, indent=4)