Search code examples
pythonvaex

Vaex Displaying Data


I have a 10.11 GB CSV File and I have converted to hdf5 using dask. It is a mixture of str, int and float values. When I try to read it with vaex I just get numbers as given in the screenshot. Can someone please help me out?

Screenshot:

enter image description here


Solution

  • I am not sure how dask (or dask.dataframe) stores data in HDF5 format. Pandas for instance stores the data in a row-based format. On the other hand vaex expects a column based HDF5 files.

    From your screenshot I see that your hdf5 file also preserves the index column - vaex does not have such a column, and expects just the data.

    To ensure the HDF5 files work with vaex, it is best to use vaex itself to do the CSV->HDF5 conversion. Otherwise perhaps something like arrow will work, since it is a standard (while HDF5 can be more flexible and this harder to support all possible version of storing data).