Search code examples
pythonpandasdataframerowssubtraction

Subtracting the average value of 60 row from the other rows in a dataframe


I have a DataFrame imported with pandas consisting of 2135 rows and 518 column. Now I want to take the mean of the first 60 rows and subtract these values from the other rows. So far I used this:

mean = df[1:60].mean()

to take the mean of the first 60 rows. I tried to subtract it by just using:

df[61:2135] - mean

but that doesn't work. I have tried a couple of more things but I can't seem to figure it out. Maybe it's the shape of the dataframes because mean has the shape (517,) whereas df[61:2135] has the shape (2072,518).


Solution

  • Your data looks malformed, the last column has trailing semi colons ...;;; you need to replace these and cast back to float:

    In [44]:
    df[517] = df[517].str.replace(';;;','').astype(float)
    df.info()
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2134 entries, 0 to 2133
    Columns: 518 entries, 0 to 517
    dtypes: float64(518)
    memory usage: 8.4 MB
    

    then what you tried will just work

    Additionally your second row doesn't look like a valid column row so you need to pass header=None to read_csv:

    df = pd.read_csv ("csvdata.csv", sep=",",skiprows=1, header=None)