Search code examples
jsoncsvparsingdata-conversionformat-conversion

How to convert a dynamic JSON like file to a CSV file


I have a file which looks exactly as below.

{"eventid" : "12345" ,"name":"test1","age":"18"}
{"eventid" : "12346" ,"age":"65"}
{"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}

Think of the above file as event.json

The number of data objects may vary per line. I would like the following csv output. and it would be output.csv

eventid,name,age,gender
12345,test1,18
12346,,65
12336,test3,22,Male

Could someone kindly help me? I could accept the answer from an any scripting language (Javascript, Python and etc.).


Solution

  • This code will collect all the headers dynamically and write the file to CSV.

    Read comments in code for details:

    import json
    
    # Load data from file
    data = '''{"eventid" : "12345" ,"name":"test1","age":"18"}
    {"eventid" : "12346" ,"age":"65"}
    {"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}'''
    
    # Store records for later use
    records = [];
    
    # Keep track of headers in a set
    headers = set([]);
    
    for line in data.split("\n"):
        line = line.strip();
    
        # Parse each line as JSON
        parsedJson = json.loads(line)
    
        records.append(parsedJson)
    
        # Make sure all found headers are kept in the headers set
        for header in parsedJson.keys():
            headers.add(header)
    
    # You only know what headers were there once you have read all the JSON once.
    
    #Now we have all the information we need, like what all possible headers are.
    
    outfile = open('output_json_to_csv.csv','w')
    
    # write headers to the file in order
    outfile.write(",".join(sorted(headers)) + '\n')
    
    for record in records:
        # write each record based on available fields
        curLine = []
        # For each header in alphabetical order
        for header in sorted(headers):
            # If that record has the field
            if record.has_key(header):
                # Then write that value to the line
                curLine.append(record[header])
            else:
                # Otherwise put an empty value as a placeholder
                curLine.append('')
        # Write the line to file
        outfile.write(",".join(curLine) + '\n')
    
    outfile.close()