Search code examples
pythonpandasdataframeperformanceprocessing-efficiency

Pandas lookup values from DataFrame by index selectors


Suppose we have an indexed Dataframe with arbitrary but long number of columns:

from numpy.random import randint
import pandas as pd

df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
print(df)

>    A   B   C   D
> 0  78   1  97  98
> 1  93  58  46  45
> 2  50   1  77  27
> 3  63  87  66  21
> 4  26   1  10  46
> 5  26  60  71  79
> 6  74   4  62  98
> 7  93  22  23  89
> 8  30  31  14  46
> 9  51   4  90  22

And have a selector, which contains which index need for each columns, like:

selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                        index=df.columns)
print(selector)

>    other_index
> A            9
> B            0
> C            3
> D            4

Now I would like to get the

selected = [df[c].loc[selector.loc[c][0]] for c in df.columns]
print(selected)

> [51, 1, 66, 46]

I'm pretty sure there is a more efficient way in pandas to achieve this, but I can't find.


Solution

  • I would use df.lookup before it got deprecated in the future. :)

    df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
        A   B   C   D
    0  93  30  17  42
    1  38  55  10  46
    2   7  30  86  36
    3  25  48  25  62
    4   1  61  50   0
    5  18  87  98  87
    6  61  57  80  34
    7  38  50  32  96
    8  72  68  75  74
    9  70  99  77  28
    
    selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                            index=df.columns)
       other_index
    A            5
    B            7
    C            5
    D            9
    
    df.lookup(selector.other_index, selector.index)
    array([18, 50, 98, 28])