Search code examples
pythonpandasparquet

converting parquet file to pandas and then querying gives error


I am trying to query a dataframe for an average of a column, and I converted a parquet file to pandas to do this. I'm getting the error TypeError('Could not convert %s to numeric' % str(x)) which seems to refer to the word "Age" in the column.

The dataframe looks like this:

         _c0     _c1  _c2    
    0  RecId   Class  Age   
    1      1    1st    29   
    2      2    1st     2   
    3      3    1st    30 

My code is:

    import pyarrow 
    import pandas
    import pyarrow.parquet as pq

    df = pq.read_table("file.parquet").to_pandas()
    average_age = df["_c2"].mean()

I tried using

    df = df(skiprows=1)

but that gives the error "TypeError: 'DataFrame' object is not callable"

How can I either skip over the row with "Age" in it or remove it, and is this related to it being read from a parquet file or is this a straight up Pandas issue?


Solution

  • You can just use pandas index to remove the first row:

    df = df.iloc[1:,:]