Search code examples
pythonnumpypandasargmax

python pandas: computing argmax of column in matrix subset


Consider toy dataframes df1 and df2, where df2 is a subset of df1 (excludes the first row).

import pandas as pd import numpy as np

df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
df2 = df1[1:]

Now lets find argmax of colA for each frame

np.argmax(df1.colA) ## result is "2", which is what I expected
np.argmax(df2.colA) ## result is still "2", which is not what I expected.  I expected "1" 

If my matrix of insterest is df2, how do I get around this indexing issue? Is this quirk related to pandas, numpy, or just python memory?


Solution

  • I think it's due to index. You could use reset_index when you assign df2:

    df1 = pd.DataFrame({'colA':[3.0,9,45,7],'colB':['A','B','C','D']})
    df2 = df1[1:].reset_index(drop=True)
    
    In [464]: np.argmax(df1.colA)
    Out[464]: 2
    
    In [465]: np.argmax(df2.colA)
    Out[465]: 1
    

    I think it's better to use method argmax instead of np.argmax:

    In [467]: df2.colA.argmax()
    Out[467]: 1