Search code examples
pythonpandasdataframelookupisin

How do you lookup in range


I have 2 data frames that I would like to return the values in a range (-1, 0, +1). One of the data frames contains Id's that i would like to look up and the other data frame contains Id's & values. For example, I want to lookup 99, 55, 117 in another data frame and return 100 99 98, 56 55 54, 118 117 116. As you can see it getting the values -1 and +1 of the Id's I would like to lookup. There is a better example below.

df    = pd.DataFrame([[99],[55],[117]],columns = ['Id'])

df2    = pd.DataFrame([[100,1,2,4,5,6,8],
               [87,1,6,20,22,23,34],
               [99,1,12,13,34,45,46],
               [64,1,10,14,29,32,33],
               [55,1,22,13,23,33,35],
               [66,1,6,7,8,9,10],
               [77,1,2,3,5,6,8],
               [811,1,2,5,6,8,10], 
               [118,1,7,8,22,44,56],
               [117,1,66,44,47,87,91]],
               columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])

I would like my result to something like this below.

results    = pd.DataFrame([[87,1,6,20,22,23,34],
               [99,1,12,13,34,45,46],
               [64,1,10,14,29,32,33],
               [64,1,10,14,29,32,33],
               [55,1,22,13,23,33,35],
               [66,1,6,7,8,9,10],
               [118,1,7,8,22,44,56],
               [117,1,66,44,47,87,91]],
               columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])

Solution

  • import pandas as pd
    import numpy as np
    
    
    ind = df2[df2['Id'].isin(df['Id'])].index
    aaa = np.array([[ind[i]-1,ind[i],ind[i]+1] for i in range(len(ind))]).ravel()
    aaa = aaa[(aaa <= df2.index.values[-1]) & (aaa >= 0)]
    df_test = df2.loc[aaa, :].reset_index().drop(['index'], axis=1)
    
    print(df_test)
    

    Output

        Id  Num1  Num2  Num3  Num4  Num5  Num6
    0   87     1     6    20    22    23    34
    1   99     1    12    13    34    45    46
    2   64     1    10    14    29    32    33
    3   64     1    10    14    29    32    33
    4   55     1    22    13    23    33    35
    5   66     1     6     7     8     9    10
    6  118     1     7     8    22    44    56
    7  117     1    66    44    47    87    91
    

    Here, in the ind list, indexes are obtained where there are the required Ids in df2.

    The aaa list creates ranges for these indexes, then the lists are wrapped in np.array, ravel() is used to concatenate them. Next, the list aaa is overwritten, the elements that are greater than the maximum index df2 are removed.

    Sampling occurs through loc.

    Update 17.12.2022

    if you need duplicate rows.

    df = pd.DataFrame([[99], [55], [117], [117]], columns=['Id'])
    lim_ind = df2.index[-1]
    
    
    def my_func(i):
        a = df2[df2['Id'].isin([i])].index.values
        a = np.array([a - 1, a, a + 1]).ravel()
        a = a[(a >= 0) & (a <= lim_ind)]
    
        return a
    
    
    qqq = [my_func(i) for i in df['Id']]
    fff = np.array([df2.loc[qqq[i]].values for i in range(len(qqq))], dtype=object)
    fff = np.vstack(fff)
    result = pd.DataFrame(fff, columns=df2.columns)
    
    print(result)
    

    Output

        Id  Num1  Num2  Num3  Num4  Num5  Num6
    0   87     1     6    20    22    23    34
    1   99     1    12    13    34    45    46
    2   64     1    10    14    29    32    33
    3   64     1    10    14    29    32    33
    4   55     1    22    13    23    33    35
    5   66     1     6     7     8     9    10
    6  118     1     7     8    22    44    56
    7  117     1    66    44    47    87    91
    8  118     1     7     8    22    44    56
    9  117     1    66    44    47    87    91