Search code examples
pythondataframeparquet

How to separately add a header row while loading a parquet file?


While handling csv files we can say:

df = pd.read_csv("test.csv", names=header_list, dtype=dtype_dict)

Above would create a dataframe with headers as header_list and dtypes as of the dtype_dict

Can we do something similar with pd.read_parquet() ?

My issue involves passing in headers separately and would thus not be available in the "test.csv"
Another way to bypass it could be to move the entire data in df downwards by 1 (including shifting headers into rows) and then replacing the header with header_list (if it's even possible?)

Is there an optimal solution to my issue? I'm not too familiar with parquet so any guidance would be appreciated, thanks.


Solution

  • Can we do something similar with pd.read_parquet() ?

    parquet files contain some metadata, including the name of the columns and their types. So there is no need to pass this information when loading the data.