Search code examples
pythonpandasdataframegridviewgrid

Convert DataFrame of options with 5*5 grid of choice


Following are the standard values for score and grades. Score column contains numbers within specified range.

Score Grade1 Score2 Grade3 Grade4
<260 A1 <200 D1 ID1
260-275 A2 200-250 D2 ID2
275-280 A3 250-260 D3 ID3
280-285 A4 260-270 D4 ID4
>=285 A5 >=270 D5 ID5

Input :-

Score Grade1 Score2 Grade3 Grade4
290 A1 265 D3 ID2

Output:-

Tiers Grade Grade1 Score2 Grade3 Grade4
1 A1
2 ID2
3 D3
4 265
5 290

Input :-

Score Grade1 Score2 Grade3 Grade4
290 A1 265 D3 ID2

Solution

  • Assuming df and df_input your two DataFrames, you can use a combination of and :

    # get df_input as Series
    s = df_input.loc[0]
    
    # find identical values
    mask = df.eq(s)
    
    # match ranges
    def match_range(val, ref):
        # extract the upper value, for the last replace by np.inf
        s = pd.to_numeric(ref.str.extract('[<-](\d+)', expand=False)).fillna(np.inf)
        # define default output as False
        out = np.zeros(len(ref), dtype=bool)
        # find matching position
        out[np.searchsorted(s, val)] = True
        return out
    
    # apply match_range on columns for which a direct match failed
    m = ~mask.any()
    mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))
    
    # generate output
    out = (pd.DataFrame(np.where(mask, s, ''),
                        index=np.arange(len(df))+1,
                        columns=df.columns)
             .rename_axis('Tiers').reset_index()
          )
    

    Output:

       Tiers Score Grade1 Score2 Grade3 Grade4
    0      1           A1                     
    1      2                               ID2
    2      3                         D3       
    3      4                 265              
    4      5   290                            
    

    Used inputs:

    df = pd.DataFrame({'Score': ['<260', '260-275', '275-280', '280-285', '>=285'],
                       'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],
                       'Score2': ['<200', '200-250', '250-260', '260-270', '>=270'],
                       'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],
                       'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})
    
    df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [265], 'Grade3': ['D3'], 'Grade4': ['ID2']})