Search code examples
python-3.xpandasif-statementpythonanywhere

How to keep a specif column based on row conditions?


Does someone knows how to get the next followed structure in Pandas?

I have a Pandas data frame that looks like this

ID  Con  Case   VAR3  VAR4  VAR5

1    TT    TG    .01  .09   .90
2    CC    TC    .20  .34   .56
3    GG    AG    .10  .20   .70
4    TC    TC    .17  .48   .35
5    AG    AG    .12  .46   .32
6    GG    TG    .15  .20   .65
7    TT    TC    .06  .24   .70
8    TT    CT    .11  .21   .68 
9    GT    GT    .05  .10   .85
  1. If the first 'letter' in [VAR2] (e.g T) it is doubled in [VAR1] (e.g. TT) -> take value from VAR3 (.01).
  2. If the combination of the letters in [VAR2] (e.g. TC) it is the same as in [VAR1] (e.g. TC) -> take value from VAR4 (.48).
  3. If the second 'letter' in [VAR2] (e.g G) it is doubled in [VAR1] (e.g GG) -> take value from VAR5 (.70).

Based on this information I would like my output data frame to look as follow:

ID  VAR1  VAR2   VAR6 

1    TT    TG    .01
2    CC    TC    .56
3    GG    AG    .70
4    TC    TC    .48
5    AG    AG    .46
6    GG    TG    .65
7    TT    TC    .06
8    TT    CT    .68
9    GT    GT    .10

Hopefully someone can give me a hint!

Thanks,

Ale.


Solution

  • If I understood your problem correctly, I'd first write a function that will evaluate your conditions and return the desired value:

    def classify(item):
        if item["VAR1"] == item["VAR2"][0]*2:
            return item["VAR3"]
        if item["VAR1"] == item["VAR2"]:
            return item["VAR4"]
        if item["VAR1"] == item["VAR2"][1]*2:
            return item["VAR5"]
        return 0  # what if none is true?
    

    Then apply it to your dataframe, specifying the new column:

    df["VAR6"] = df.apply(classify, axis=1)
    
    # You can also drop the middle columns if needed
    df.drop(["VAR3", "VAR4", "VAR5"], axis=1, inplace=True)
    

    Output:

       VAR1 VAR2  VAR6
    ID                
    1    TT   TG  0.01
    2    CC   TC  0.56
    3    GG   AG  0.70
    4    TC   TC  0.48
    5    AG   AG  0.46
    6    GG   TG  0.65
    7    TT   TC  0.06
    8    TT   CT  0.68
    9    GT   GT  0.10