Search code examples
pandasparquetfastparquet

date can not be serialized


I am getting an error while trying to save the dataframe as a file.

from fastparquet import write 
write('profile_dtl.parq', df)

The error is related to "date" and the error message looks like this...

ValueError: Can't infer object conversion type: 0    1990-01-01
1    1954-01-01
2    1981-11-15
3    1993-01-21
4    1948-01-01
5    1977-01-01
6    1968-04-28
7    1969-01-01
8    1989-01-01
9    1985-01-01
Name: dob, dtype: object

I have checked that the column is "object" just like any other column that can be serialized without any problem. If I remove the "dob" column from the dataframe, then this line will work. This will also work if there is date+time.

Only dates are not accepted by fast-parquet?


Solution

  • Try changing dob to datetime64 dtype:

    import pandas as pd
    dob = pd.Series(['1954-01-01', '1981-11-15', '1993-01-21', '1948-01-01',
                     '1977-01-01', '1968-04-28', '1969-01-01', '1989-01-01',
                     '1985-01-01'], name='dob')
    Out:
    0    1954-01-01
    1    1981-11-15
    2    1993-01-21
    3    1948-01-01
    4    1977-01-01
    5    1968-04-28
    6    1969-01-01
    7    1989-01-01
    8    1985-01-01
    Name: dob, dtype: object
    

    Note the dtype that results:

    pd.to_datetime(dob)
    
    Out:
    0   1954-01-01
    1   1981-11-15
    2   1993-01-21
    3   1948-01-01
    4   1977-01-01
    5   1968-04-28
    6   1969-01-01
    7   1989-01-01
    8   1985-01-01
    dtype: datetime64[ns]
    

    Using this Series as an index in a DataFrame:

    baz = list(range(9))
    foo = pd.DataFrame(baz, index=pd.to_datetime(dob), columns=['dob'])
    

    You should be able to save your Parquet file now.

    from fastparquet import write
    
    write('foo.parquet', foo)
    

    $ls -l foo.parquet
    -rw-r--r--  1 moi  admin  854 Oct 13 16:44 foo.parquet
    


    Your dob Series has an object dtype and you left unchanged the object_encoding='infer' argument to fastparquet.write. So, from the docs:

    "The special value 'infer' will cause the type to be guessed from the first ten non-null values."

    Fastparquet does not try to infer a date value from what it expects to be one of bytes|utf8|json|bson|bool|int|float.