Search code examples
pythonjsoncsvtype-conversionjsonlines

Python: Converting JsonL to Json to CSV


currently working with jsonl files and I intend to convert it into CSV format to run it through a program. However, I realize that it would be better to convert it from json directly to CSV instead, and I wrote a code below to convert json to csv. However, I am unsure on how I can convert my current jsonl files into the desired json format before I can run this code. If anyone has any solutions for me, do let me know! Thanks so much for the read and appreciate all the help that I can get.

(FYI I tried to convert the jsonl file directly using the json to csv converter below and I get an error message below:)

Converting to CSV: XXX.jsonl
ERROR: Extra data

This is the conversion code, I hope it helps!

from json.decoder import JSONDecodeError
import pandas as pd
import sys
from flatten_json import flatten
import json

def main():
    if len(sys.argv) not in [1, 2]:
        sys.exit("Usage: python JsonCon.py [FILENAME.json] \n exitted")

    filename = sys.argv[1]
    print(f"Converting to CSV: {filename}")
    convertFile(filename)

def convertFile(filename):
    try:
        with open(filename) as file:
            jsString = json.load(file)
            dic_flat = [flatten(d) for d in jsString]
            df = pd.DataFrame(dic_flat)
            df.to_csv(f'{filename[:-5]}.csv')
    except JSONDecodeError as e:
        print(f'ERROR: {e.msg}')

if __name__ == "__main__":
    main()

Solution

  • import json
    import csv
    import io
    
    # get the JSON objects from JSONL
    jsonl_data = """{"a": 1, "b": 123}\n{"a": 2, "b": 234}\n{"a": 3, "b": 345}\n"""
    json_lines = tuple(json_line
                       for json_line in jsonl_data.splitlines()
                       if json_line.strip())
    jsons_objs = tuple(json.loads(json_line)
                       for json_line in json_lines)
    
    # write them into a CSV file
    fake_file = io.StringIO()
    writer = csv.writer(fake_file)
    writer.writerow(["a", "b"])
    writer.writerows((value for key, value in sorted(json_obj.items()))
                     for json_obj in jsons_objs)
    print(fake_file.getvalue())