I have a DataFrame imported with pandas consisting of 2135 rows and 518 column. Now I want to take the mean of the first 60 rows and subtract these values from the other rows. So far I used this:
mean = df[1:60].mean()
to take the mean of the first 60 rows. I tried to subtract it by just using:
df[61:2135] - mean
but that doesn't work. I have tried a couple of more things but I can't seem to figure it out. Maybe it's the shape of the dataframes because mean has the shape (517,) whereas df[61:2135] has the shape (2072,518).
Your data looks malformed, the last column has trailing semi colons ...;;;
you need to replace these and cast back to float:
In [44]:
df[517] = df[517].str.replace(';;;','').astype(float)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2134 entries, 0 to 2133
Columns: 518 entries, 0 to 517
dtypes: float64(518)
memory usage: 8.4 MB
then what you tried will just work
Additionally your second row doesn't look like a valid column row so you need to pass header=None
to read_csv
:
df = pd.read_csv ("csvdata.csv", sep=",",skiprows=1, header=None)