Search code examples
pythondataframedictionaryindexingkey

How to replace values based on dictionary within for loop? [python]


I have a whole output csv of data I want to plug into a regression model, containing of course, a dependent variable column and associated values, and multiple independent variable columns (parameters), and those associated values. The problem is that the values I have for my independent variables are not the "real' values, they are just placeholders that correspond to the actual values. I am using python and pandas for the dataframe.

I have this dataframe ("Param" for parameter):

    Color Zone Dependent.var  Param_1  Param_2  Param_3
0    Blue    A            XX        0        1        3
1    Blue    A            XX        2        1        1
2    Blue    A            XX        0        0        1
3    Blue    B            XX        1        0        0
4    Blue    B            XX        2        2        2
5    Blue    B            XX        2        1        1
6    Blue    C            XX        2        0        0
7    Blue    C            XX        2        2        3
8    Blue    C            XX        3        1        1
9     Red    A            XX        3        2        1
10    Red    A            XX        0        1        3
11    Red    A            XX        3        3        2
12    Red    B            XX        2        1        1
13    Red    B            XX        0        1        3
14    Red    B            XX        1        0        0
15    Red    C            XX        0        2        1
16    Red    C            XX        1        1        0
17    Red    C            XX        3        2        3
18  Green    A            XX        2        2        3
19  Green    A            XX        0        1        3
20  Green    A            XX        3        2        2
21  Green    B            XX        2        1        1
22  Green    B            XX        0        2        3
23  Green    B            XX        2        3        2
24  Green    C            XX        0        1        1
25  Green    C            XX        3        1        0
26  Green    C            XX        3        2        3

Sorry this is so long, but I felt necessary to communicate the data structure. Now I have 9 separate dictionaries (or keys, I get confused between the terms), 1 for each Color and Zone combination, that contain the "translation" of the initial placeholder values with their "real" values", and they look like this, where the "real" values are different between each Color/Zone:

For Blue (1 for Blue/A, 1 for Blue/B, and 1 for Blue/C):

          0   1   2   3
------------------------
Param_1   XX  XX  XX   XX
Param_2   XX  XX  XX   XX
Param_3   XX  XX  XX   XX 

For Red (1 for Red/A, 1 for Red/B, and 1 for Red/C):

          0   1   2   3
------------------------
Param_1   YY  YY  YY   YY
Param_2   YY  YY  YY   YY 
Param_3   YY  YY  YY   YY

For Green (1 for Green/A, 1 for Green/B, and 1 for Green/C):

          0   1   2   3
------------------------
Param_1   ZZ  ZZ  ZZ  ZZ
Param_2   ZZ  ZZ  ZZ  ZZ 
Param_3   ZZ  ZZ  ZZ  ZZ

I want to take these "parameter value maps" and then use them to replace the numbers in the initial dataframe. Is there a simple way to do this in python? I just get really confused with how to address the matter where there are three separate Colors, and 3 separate zones, with 9 separate sets of replacement values. I am thinking this would involve a for loop to iterate though the "Colors" and "Zones", but I am not sure.


Solution

  • Since you don't give a specific mapping, I'll use this example mapping:

    PARAMS = ["Param_1", "Param_2", "Param_3"]
    
    PARAM_MAP = {'Red': {'Param_1': {0: 'P1_R0', 1: 'P1_R1', 2: 'P1_R2', 3: 'P1_R3'},
                         'Param_2': {0: 'P2_R0', 1: 'P2_R1', 2: 'P2_R2', 3: 'P2_R3'},
                         'Param_3': {0: 'P3_R0', 1: 'P3_R1', 2: 'P3_R2', 3: 'P3_R3'}},
                 'Green': {'Param_1': {0: 'P1_G0', 1: 'P1_G1', 2: 'P1_G2', 3: 'P1_G3'},
                           'Param_2': {0: 'P2_G0', 1: 'P2_G1', 2: 'P2_G2', 3: 'P2_G3'},
                           'Param_3': {0: 'P3_G0', 1: 'P3_G1', 2: 'P3_G2', 3: 'P3_G3'}},
                 'Blue': {'Param_1': {0: 'P1_B0', 1: 'P1_B1', 2: 'P1_B2', 3: 'P1_B3'},
                          'Param_2': {0: 'P2_B0', 1: 'P2_B1', 2: 'P2_B2', 3: 'P2_B3'},
                          'Param_3': {0: 'P3_B0', 1: 'P3_B1', 2: 'P3_B2', 3: 'P3_B3'}}}
    

    And use this function which we'll apply to the dataframe:

    def mapping(row):
        color_map = PARAM_MAP[row["Color"]]
        return pd.Series({p: color_map[p][v] for p, v in zip(PARAMS, row[PARAMS])})
    

    Now apply the mapping and reassign:

    df[PARAMS] = df.apply(mapping, axis=1)
    

    Output:

        Color Zone Dependent.var Param_1 Param_2 Param_3
    0    Blue    A            XX   P1_B0   P2_B1   P3_B3
    1    Blue    A            XX   P1_B2   P2_B1   P3_B1
    2    Blue    A            XX   P1_B0   P2_B0   P3_B1
    3    Blue    B            XX   P1_B1   P2_B0   P3_B0
    4    Blue    B            XX   P1_B2   P2_B2   P3_B2
    5    Blue    B            XX   P1_B2   P2_B1   P3_B1
    6    Blue    C            XX   P1_B2   P2_B0   P3_B0
    7    Blue    C            XX   P1_B2   P2_B2   P3_B3
    8    Blue    C            XX   P1_B3   P2_B1   P3_B1
    9     Red    A            XX   P1_R3   P2_R2   P3_R1
    10    Red    A            XX   P1_R0   P2_R1   P3_R3
    11    Red    A            XX   P1_R3   P2_R3   P3_R2
    12    Red    B            XX   P1_R2   P2_R1   P3_R1
    13    Red    B            XX   P1_R0   P2_R1   P3_R3
    14    Red    B            XX   P1_R1   P2_R0   P3_R0
    15    Red    C            XX   P1_R0   P2_R2   P3_R1
    16    Red    C            XX   P1_R1   P2_R1   P3_R0
    17    Red    C            XX   P1_R3   P2_R2   P3_R3
    18  Green    A            XX   P1_G2   P2_G2   P3_G3
    19  Green    A            XX   P1_G0   P2_G1   P3_G3
    20  Green    A            XX   P1_G3   P2_G2   P3_G2
    21  Green    B            XX   P1_G2   P2_G1   P3_G1
    22  Green    B            XX   P1_G0   P2_G2   P3_G3
    23  Green    B            XX   P1_G2   P2_G3   P3_G2
    24  Green    C            XX   P1_G0   P2_G1   P3_G1
    25  Green    C            XX   P1_G3   P2_G1   P3_G0
    26  Green    C            XX   P1_G3   P2_G2   P3_G3
    

    I imagine there may be a better way, perhaps with groupby, but I couldn't figure it out.