Search code examples
pythonjsonjsonlines

How to parse jsonlines file using pandas


I am new to python and trying to parse data from a file that contains millions of lines. Tried to go old school to parse it using excel but it fails. How can I parse the information efficiently and export them into an excel file so that it is easier for other people to read?

I tried using this code provided by someone else but no luck so far

import re
import pandas as pd

def clean_data(filename):
    with open(filename, "r") as inputfile:
        for row in inputfile:
            if re.match("\[", row) is None:
                yield row

with open(clean_file,  'w') as outputfile:
    for row in clean_data(filename):
        outputfile.write(row)
NameError: name 'clean_file' is not defined

Solution

  • It looks like clean_file is not defined, which is probably a problem from copy/pasteing code.

    Did you mean to write to a file called "clean_file"? In which case you need to wrap it in quotes: with open("clean_file", 'w')

    If you want to work with json I sugget looking into the json package which has lots of tools for loading and parsing json. Otherwise, if the json is flat, you can just use the inbuilt pandas function read_json