Search code examples
pythonpandash5py

Inside a pandas dataframe table that I open in a .h5 (h5py) file, how can I remove rows from it?


So I am opening my h5 file in a Python shell as follows:

import pandas as pd
file = pd.read_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')

Now when I do file[:] there are over 600,000 rows of data, and I would like to delete most of them.

I try something like:

file = file.drop(file.index[5:643368])

Now when I print file, I get the remaining 5 rows in the table, which is exactly what I want.

But when I exit the Python shell, re-enter it, and open up the quote table above, there are 643368 rows again.

What am I missing?


Solution

  • You need to export your file with to_hdf, assigning to the same key:

    df = pd.read_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')
    
    (df.drop(file.index[5:643368])
       .to_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')
    )