Search code examples
pandascsvdataframeimportbins

How to write pandas dataframe containing bins to a file so it can be read back into pandas?


I have a pandas dataframe in the following format:

    df = pd.DataFrame({'a' : [0,1,2,3,4,5,6], 'b' : [-0.5, 0.0, 1.0, 1.2, 1.4, 1.3, 1.1]})
    df['aBins'] = pd.cut(df['a'], bins = np.arange(0,10,2), include_lowest = True)

Where the each bin is an Interval:

    type(df['aBins'].iloc[0])

    pandas._libs.interval.Interval

and the series stores them as categorical data:

    df.info()

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 7 entries, 0 to 6
    Data columns (total 3 columns):
    a        7 non-null int64
    b        7 non-null float64
    aBins    7 non-null category
    dtypes: category(1), float64(1), int64(1)
    memory usage: 263.0 bytes        

I am trying to save this dataframe as a file so that it can be read back into a dataframe easily. I have tried saving it as a .csv file using .to_csv(), but when I read it back into pandas 'aBins' is read in as a string.

    df.to_csv('test.csv', index = False)
    df_reread = pd.read_csv('test.csv')
    df_reread.info()

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 7 entries, 0 to 6
    Data columns (total 3 columns):
    a        7 non-null int64
    b        7 non-null float64
    aBins    7 non-null object
    dtypes: float64(1), int64(1), object(1)
    memory usage: 248.0+ bytes

Is there a good way to save and reread this dataframe so that is can be read back in to pandas in the same state?


Solution

  • You might want to check out pandas.DataFrame.to_pickle and pandas.read_pickle:

    >>> df.to_pickle("./test.pkl")
    ...
    ...
    >>> df = pd.read_pickle("./test.pkl")
    >>> type(df['aBins'].iloc[0]) 
    pandas._libs.interval.Interval