i have a pandas df in the following form:
first deviation | second deviation | closest deviation | column name of smallest deviation |
---|---|---|---|
-0.5 | NaN | -0.5 | first deviation |
-0.4 | -0.8 | -0.4 | first deviation |
closest deviation and column name of smalles deviation are calculated columns of the first two columns. The table lists the desired outcome, however I haven't found a solution to get to the desired outcome.
The deviation columns show deviations in between two functions and an input function. I want to find out which function is closer to 0 in its deviation and secondly, I want to record the deviation.
Now, if I use df.abs().idxmin(axis = 1)
, I get the right value for column name of smallest deviation, but using df.abs().min(axis = 1)
then for closest deviation logically returns 0.4 and not -0.4. using only df.min(axis=1)
would then however return -0.8 for closest deviation, which also is not correct.
How do I get the correct information including the correct sign?
Thanks already!
After getting the indices of the closests with your way, we can index into the dataframe to get the corresponding values:
# your way for indexes
inds = df.abs().idxmin(axis=1)
# for values: either this (being deprecated..)
vals = df.lookup(df.index, inds)
# or this with numpy's "fancy" indexing
vals = df.to_numpy()[np.arange(len(df)), df.columns.get_indexer(inds)]
# then putting to frame
df["closest deviation"] = vals
df["column name of smallest deviation"] = inds
to get
first deviation second deviation closest deviation column name of smallest deviation
0 -0.5 NaN -0.5 first deviation
1 -0.4 -0.8 -0.4 first deviation
DataFrame.lookup
is getting deprecated so other way is to go to numpy domain and index there. Since inds
are column names but numpy doesn't know them, we get their integer locations with get_indexer
.