Search code examples
pythonnumpypandasnanhdfstore

get back nan values after storing in HDFStore


I am storing a big dataset with lot of NaN values in a HDFStore using the following code with python/pandas:

with get_store(work_path+'/stores/store.h5') as store:
        for chunk in reader:
            for column in column_list:
                store.append('%s' % column, chunk[column],
                             data_columns=column)

And then I want to load the first column as a numpy array, so I have:

array = store.select(column_list[0]).as_matrix()

The problem is that I get a tiny array without any of the initial NaN values, because when I store the data in the store, it kinda "forget" the NaN values and only keep the non NaN values and their indexes. How can I get back the array with the initial NaN values?


Solution

  • You need to pass dropna=False, see here

    FYI, you are creating a column store essentially (which may or may not fit your problem better).