I have a file which looks exactly as below.
{"eventid" : "12345" ,"name":"test1","age":"18"}
{"eventid" : "12346" ,"age":"65"}
{"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}
Think of the above file as event.json
The number of data objects may vary per line. I would like the following csv output. and it would be output.csv
eventid,name,age,gender
12345,test1,18
12346,,65
12336,test3,22,Male
Could someone kindly help me? I could accept the answer from an any scripting language (Javascript, Python and etc.).
This code will collect all the headers dynamically and write the file to CSV.
Read comments in code for details:
import json
# Load data from file
data = '''{"eventid" : "12345" ,"name":"test1","age":"18"}
{"eventid" : "12346" ,"age":"65"}
{"eventid" : "12336" ,"name":"test3","age":"22","gender":"Male"}'''
# Store records for later use
records = [];
# Keep track of headers in a set
headers = set([]);
for line in data.split("\n"):
line = line.strip();
# Parse each line as JSON
parsedJson = json.loads(line)
records.append(parsedJson)
# Make sure all found headers are kept in the headers set
for header in parsedJson.keys():
headers.add(header)
# You only know what headers were there once you have read all the JSON once.
#Now we have all the information we need, like what all possible headers are.
outfile = open('output_json_to_csv.csv','w')
# write headers to the file in order
outfile.write(",".join(sorted(headers)) + '\n')
for record in records:
# write each record based on available fields
curLine = []
# For each header in alphabetical order
for header in sorted(headers):
# If that record has the field
if record.has_key(header):
# Then write that value to the line
curLine.append(record[header])
else:
# Otherwise put an empty value as a placeholder
curLine.append('')
# Write the line to file
outfile.write(",".join(curLine) + '\n')
outfile.close()