Search code examples
pythonpandasindexinginsert

Knowing a column of indexes how to get column of their values in Dataframe?


I have a DataFramedf = pd.DataFrame(np.random.randint(-100, 100, 100).reshape(10, -1), columns=list(range(10)), index=list('abcdefghij')) and I found some indexes using method .idxmin(axis=1) and getting Seriesdf_ind =

  • a 5
  • b 8
  • c 2
  • d 0
  • e 7
  • f 0
  • g 4
  • h 6
  • i 5
  • j 2

How to get column of their values and than add (insert) it to df?

df.insert(11, "values", df[df_ind]) gives me a 10x10 DataFrame, but not 1x10 DataFrame


Solution

  • If you want to slice the values, you can use a modified indexing lookup:

    # first reindex the indexer to ensure
    # having the correct final order
    idx, cols = pd.factorize(df_ind.reindex(df.index))
    
    # then perform indexing lookup
    df['values'] = df.reindex(columns=cols).to_numpy()[np.arange(len(df)), idx]
    

    Output:

        0   1   2   3   4   5   6   7   8   9  values
    a  72 -53  17  92 -33  95   3 -91 -79 -64      95
    b -13 -30 -12  40 -42  93 -61 -13  74 -12      74
    c -19  65 -75 -23 -28 -91  48  15  97 -21     -75
    d  75  92 -18  -1  77 -71  47  47  42  67      75
    e -68  93 -91  85  27 -68 -69  51  63  14      51
    f  83 -72 -66  28  28  64 -47  33 -62 -83      83
    g -21  32   5 -58  86 -69  20 -99 -35  69      86
    h -43 -65   2  19 -89  74 -18  -9  28  42     -18
    i  -1 -47  40  21  70 -16 -32 -94  96 -53     -16
    j  27  31   0  80 -22  43  48  86 -77  41       0
    

    Output if the original DataFrame was in a different order:

        0   1   2   3   4   5   6   7   8   9  values
    c -19  65 -75 -23 -28 -91  48  15  97 -21     -75
    e -68  93 -91  85  27 -68 -69  51  63  14      51
    h -43 -65   2  19 -89  74 -18  -9  28  42     -18
    d  75  92 -18  -1  77 -71  47  47  42  67      75
    g -21  32   5 -58  86 -69  20 -99 -35  69      86
    i  -1 -47  40  21  70 -16 -32 -94  96 -53     -16
    b -13 -30 -12  40 -42  93 -61 -13  74 -12      74
    a  72 -53  17  92 -33  95   3 -91 -79 -64      95
    j  27  31   0  80 -22  43  48  86 -77  41       0
    f  83 -72 -66  28  28  64 -47  33 -62 -83      83
    

    Reproducible df_ind:

    df_ind = pd.Series({'a': 5, 'b': 8, 'c': 2, 'd': 0, 'e': 7, 'f': 0, 'g': 4, 'h': 6, 'i': 5, 'j': 2})