PyArrow / Dask to_parquet partition all null columns

When writing Dask dataframe partitions to parquet I've noticed that reading_parquet fails on conflicting meta data / schemas. This is because in some of the partitions column(s) are entirely null / np.nan and in others they are filled with values.

Beforehand I've casted the data types of my partitions:

df = df.astype(dtypes)

PyArrow fails to read my partitioned parquet files, because columns with only nulls are reassigned with datatype 'null'. How do I tackle this issue? Some of the partitions have columns with all nulls, while in others they are not entirely null.

Data types of columns are either integer, float or strings (object).

Solution

I recommend raising an issue on either the Dask or Arrow issue trackers