i have quite a problem with my dataset:
The (future) dataset is a pandas dataframe that i loaded from a pickle file, the pandas dataset behaves correctly. My code is:
dataset.from_pandas(df)
dataset.push_to_hub("username/my_dataset", private=True)
because I thought it was pandas fault I also tried:
dataset = Dataset.from_dict(df_sentences.to_dict(orient='list'))
dataset.push_to_hub("username/my_dataset", private=True)
and to load it from file.
The error I get is:
ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: string
My dataset is composed by 4 columns of type string and one of ints, around 3600 rows
Without having a reproducible sample, it is hard to test it, but one option is to convert data to string[pyarrow]
dtype:
dtypes = {
'column_a': 'string[pyarrow]',
'col_b': 'string[pyarrow]',
...
}
df_converted = df.astype(dtypes)
# proceed with the push
If possible, I would also upgrade to the latest versions, esp. for pyarrow
and pandas
.