Search code examples
pythonpandaspython-2.5

How to conditionally select column based on other columns under pandas DataFrame without using where function?


I'm working under python 2.5 (I'm restricted to that version due to external api) and would like to get same results as below code I wrote under python 2.7

import pandas as pd
df = pd.DataFrame({"lineId":[1,2,3,4], "idCaseMin": [10, 23, 40, 8], "min": [-110, -205, -80, -150], "idCaseMax": [5, 27, 15, 11], "max": [120, 150, 110, 90]})
df = df.set_index("lineId")

df["idMax"] = df["idCaseMax"].where(df["max"]>abs(df["min"]),df["idCaseMin"])

The DataFrame results in:

>>> df
        idCaseMax  max  idCaseMin  min  idMax
lineId                                       
1               5   10        120 -110      5
2              27   23        150 -205     23
3              15   40        110  -80     15
4              11    8         90 -150      8

The idMax column is defined based on the id which gets the greatest value, in absolute module, within max and min columns.

I can't use where function as it's not available under pandas 0.9.0 (latest version available for python 2.5) and numpy 1.7.1.

So, which options do I have to get same results for idMax column without using pandas where function?


Solution

  • IIUC you can use numpy.where():

    In [120]: df['idMax'] =  \
                  np.where(df["max"]<=abs(df["min"]),
                           df["idCaseMin"],
                           df["idCaseMax"])
    
    In [121]: df
    Out[121]:
            idCaseMax  idCaseMin  max  min  idMax
    lineId
    1               5         10  120 -110      5
    2              27         23  150 -205     23
    3              15         40  110  -80     15
    4              11          8   90 -150      8