Inside a pandas dataframe table that I open in a .h5 (h5py) file, how can I remove rows from it?

So I am opening my h5 file in a Python shell as follows:

import pandas as pd
file = pd.read_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')

Now when I do file[:] there are over 600,000 rows of data, and I would like to delete most of them.

I try something like:

file = file.drop(file.index[5:643368])

Now when I print file, I get the remaining 5 rows in the table, which is exactly what I want.

But when I exit the Python shell, re-enter it, and open up the quote table above, there are 643368 rows again.

What am I missing?

Solution

You need to export your file with to_hdf, assigning to the same key:

df = pd.read_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')

(df.drop(file.index[5:643368])
   .to_hdf('/...mapping_trade_type/BBG002RRB_L3.h5', 'quote')
)