Search code examples
pythonpandascommand-line-interfacefeather

is there a command line tool to read number of rows and columns of a feather file?


I'm working in a project that generates a lot of data, and every month a process writes a csv file with over 6 million records, and then gets converted into a feather file. There's a need to verify the number of records that the feather file has and compare it to its corresponding csv file.

I have searched and have not found a command line tool for reading the metadata of a feather file, in order to know the number of rows and columns it has.

The closest solution I have is to create a script in Python + Pandas to read the file as a Pandas Dataframe and do df.shape

But my feather files are around 6-10 million records, and it is time consuming to load the feather file into the dataframe, and the machine where I have to test it doesn't have a lot of memory or processing power.

import pandas as pd

filename = "jan_records.feather"
df = pd.read_feather(filename)
df.shape

I appreciate any help on this.


Solution

  • Do something like:

    import pyarrow
    
    filename = "jan_records.feather"
    total_rows = 0
    with pyarrow.ipc.open_file(filename) as reader:
        for batchi in range(reader.num_record_batches):
            total_rows += reader.get_batch(batchi).num_rows