Consider toy dataframes df1 and df2, where df2 is a subset of df1 (excludes the first row).
import pandas as pd import numpy as np
df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
df2 = df1[1:]
Now lets find argmax of colA for each frame
np.argmax(df1.colA) ## result is "2", which is what I expected
np.argmax(df2.colA) ## result is still "2", which is not what I expected. I expected "1"
If my matrix of insterest is df2, how do I get around this indexing issue? Is this quirk related to pandas, numpy, or just python memory?
I think it's due to index. You could use reset_index
when you assign df2
:
df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
df2 = df1[1:].reset_index(drop=True)
In [464]: np.argmax(df1.colA)
Out[464]: 2
In [465]: np.argmax(df2.colA)
Out[465]: 1
I think it's better to use method argmax
instead of np.argmax
:
In [467]: df2.colA.argmax()
Out[467]: 1