Search code examples
pythonjsonpandasdataframetextedit

Turning a huge TextEdit file of JSON into a Pandas dataframe


I have an extremely large list of JSON files in the form of a TextEdit document, each of which has 6 key-value pairs.

I would like to turn each key-value pair into a column name for a Pandas Dataframe, and list the values under the column.

{'column1': "stuff stuff", 'column2': "details details, ....}

Is there a standard way to do this?

I think you could begin uploading the file into a dataframe with

import pandas as pd
df = pd.read_table(file_name)

I think each column could be created by iterating through each JSON document using groupby.

EDIT: I think the correct approach is to parse each JSON object into a Dataframe, and then create a function to iterate through all JSONs and create one Dataframe.


Solution

  • Take a look at read_json or json_normalize. You would indeed most likely read each file and then use for instance pd.concat to combine them as required.

    Something along the below lines should work, depending on what your file looks like (here assuming that each json dictionary makes up a line in the file:

    df = pd.DataFrame()
    f = open('workfile', 'r')
    for line in f:
        df = pd.concat([df, pd.read_json(line, orient='columns')])