I am storing a big dataset with lot of NaN
values in a HDFStore
using the following code with python
/pandas
:
with get_store(work_path+'/stores/store.h5') as store:
for chunk in reader:
for column in column_list:
store.append('%s' % column, chunk[column],
data_columns=column)
And then I want to load the first column as a numpy
array, so I have:
array = store.select(column_list[0]).as_matrix()
The problem is that I get a tiny array without any of the initial NaN
values, because when I store the data in the store, it kinda "forget" the NaN
values and only keep the non NaN
values and their indexes. How can I get back the array with the initial NaN
values?
You need to pass dropna=False
, see here
FYI, you are creating a column store essentially (which may or may not fit your problem better).