Search code examples
pythonjson

save a list of different Dataframes to json


I have different pandas dataframes, which I put in a list. I want to save this list in json (or any other format) which can be read by R.

import pandas as pd

def create_df_predictions(extra_periods):
    """
    make a empty df for predictions  
    params: extra_periods = how many prediction in the future the user wants
    """
    df = pd.DataFrame({ 'model': ['a'], 'name_id': ['a'] })
    for col in range(1, extra_periods+1):
        name_col = 'forecast' + str(col)
        df[name_col] = 0

    return df

df1 = create_df_predictions(9) 
df2 = create_df_predictions(12)
list_df = [df1, df2]

The question is how to save list_df in a readable format for R? Note that df1 and df2 are have a different amount of columns!


Solution

  • don't know panda DataFrames in detail, so maybe this won't work. But in case it is kind of a traditional dict, you should be able to use the json module.

    df1 = create_df_predictions(9) 
    df2 = create_df_predictions(12)
    list_df = [df1, df2]
    

    You can write it to a file, using json.dumps(list_df), which will convert your list of dicts to a valid json representation.

    import json
    with open("my_file", 'w') as outfile:
        outfile.write(json.dumps(list_df))
    

    Edit: as commented by DaveR dataframes are't serializiable. You can convert them to a dict and then dump the list to json.

    import json
    with open("my_file", 'w') as outfile:
        outfile.write(json.dumps([df.to_dict() for df in list_df]))
    

    Alternatively pd.DataFrame and pd.Series have a to_json() method, maybe have a look at those as well.