Search code examples
pythonpandasfeather

is it possible to specify column types when saving a pandas DataFrame to feather?


Currently, if a column happens to have only nulls, an exception is thrown with the error:

Invalid: Unable to infer type of object array, were all null

It is possible to specify the type of the column, that will be used instead of inferring the type?

Versions:

feather-format==0.3.1
pandas==0.19.1

Sample code:

feather.write_dataframe(pandas.DataFrame([None]*5), '/tmp/test.feather')

Solution

  • Change (or replace) None to numpy.nan and it'll work:

    In [22]: feather.write_dataframe(pd.DataFrame([np.nan]*5), 'd:/temp/test.feather')
    
    In [23]: feather.read_dataframe('d:/temp/test.feather')
    Out[23]:
        0
    0 NaN
    1 NaN
    2 NaN
    3 NaN
    4 NaN
    

    PS NumPy / Pandas / SciPy / etc. have their own representation of Vanilla Python's None - NaN (Not A Number) or NaT (Not A Time for DateTime-like dtypes)