Search code examples
pythonpandasdataframeindexingcalculated-columns

Recording the minimum absolute deviation, but including its sign


i have a pandas df in the following form:

first deviation second deviation closest deviation column name of smallest deviation
-0.5 NaN -0.5 first deviation
-0.4 -0.8 -0.4 first deviation

closest deviation and column name of smalles deviation are calculated columns of the first two columns. The table lists the desired outcome, however I haven't found a solution to get to the desired outcome.

The deviation columns show deviations in between two functions and an input function. I want to find out which function is closer to 0 in its deviation and secondly, I want to record the deviation.

Now, if I use df.abs().idxmin(axis = 1), I get the right value for column name of smallest deviation, but using df.abs().min(axis = 1) then for closest deviation logically returns 0.4 and not -0.4. using only df.min(axis=1) would then however return -0.8 for closest deviation, which also is not correct.

How do I get the correct information including the correct sign?

Thanks already!


Solution

  • After getting the indices of the closests with your way, we can index into the dataframe to get the corresponding values:

    # your way for indexes
    inds = df.abs().idxmin(axis=1)
    
    # for values: either this (being deprecated..)
    vals = df.lookup(df.index, inds)
    
    # or this with numpy's "fancy" indexing
    vals = df.to_numpy()[np.arange(len(df)), df.columns.get_indexer(inds)]
    
    # then putting to frame
    df["closest deviation"] = vals
    df["column name of smallest deviation"] = inds
    

    to get

       first deviation  second deviation  closest deviation column name of smallest deviation
    0             -0.5               NaN               -0.5                   first deviation
    1             -0.4              -0.8               -0.4                   first deviation
    

    note:

    DataFrame.lookup is getting deprecated so other way is to go to numpy domain and index there. Since inds are column names but numpy doesn't know them, we get their integer locations with get_indexer.