Search code examples
pythonpython-polarspyarrowhuggingface-datasets

How is it possible to convert a polars dataframe to a huggingface dataset?


Both formats are based on arrow, but I couldn't find a way to transform one to the other.

The best I could find is:

The dataset library has a from_buffer method that expects a pyarrow.Buffer object. Moreover, the polars library has a to_arrow method that returns a pyarrow.Table object


Solution

  • The init method of Dataset expects a pyarrow Table so as its first parameter so it should just be a matter of

    HG_dataset=Dataset(df.to_arrow())
    

    The other methods in that class are just means to convert other structures to pyarrow. It seems as though Hugging Face datasets are more restrictive in that they don't allow nested structures so you may need to do df.explode(List_cols).to_arrow()