Search code examples
rdplyrreplacepurrr

R: if NA, replace the missing value by that from another column, based on condition, using the column names pattern


I have the following issue: for one Rating, the values in the 3 numerical columns out of 6 are missing.

I would like the missing values to be replaced by the values from the cells that are situated to the left from the column with the missing values. The column names have a clear pattern - they all end with PD1, PD2 or PD3, which can be used as id.

Example: the value in the column eePD1 is missing for the Rating 1C, Basic Scenario and CORP Class. It should be replaced by the value in the column wePD1 ONLY if the Class is CORP. (RETAIL Class should not be touched).

I.e, the value in eePD1=wePD1 if the value in eePD1 was NA and Class=CORP.

Scenario Rating Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
Basic 1C CORP 0.51 0.74 0.43 NA NA NA
Basic 2A CORP 0.41 0.01 0.23 0.37 0.06 0.81
Basic 2B RETAIL 0.68 0.48 0.71 NA NA NA

Desirable result: if value from the cell in the column starting from ee is NA for the CORP Rating and any Class and Scenario, replace NA with the value from the column starting from we and ending like the name of the column starting from ee, for example PD1, PD2 or PD3.

Scenario Rating Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
Basic 1C CORP 0.51 0.74 0.43 0.51 0.74 0.43
Basic 2A CORP 0.41 0.01 0.23 0.37 0.06 0.81
Basic 2B RETAIL 0.68 0.48 0.71 NA NA NA

Here is the reproducible code chunk:

df3=structure(list(Scenario = c("Basic", "Basic", "Basic", "Basic", "Basic"
    ), Rating = c("1C", "2A", "2B", "2C", "3A"), Class = c("CORP", 
    "CORP", "CORP", "CORP", "RETAIL"), wePD1 = c(0.51, 
    0.41, 0.58, 0.28, 
    0.68), wePD2 = c(0.74, 0.01, 
    0.28, 0.92, 0.48
    ), wePD3 = c(0.43, 0.23, 0.04, 
    0.62, 0.71), eePD1 = c(NA, 0.37, 
    0.96, 0.22, NA
    ), eePD2 = c(NA, 0.06, 0.29, 0.22, 
    NA), eePD3 = c(NA, 0.81, 0.85, 
    0.78, NA)), row.names = c(NA, 
    -5L), class = c("data.table", "data.frame"))

The second solution option would be count three columns to the left, but it is not elegant or generalizable in case I were to have more columns...

So far, I could find only a few dplyr examples that would replace the value to the one below/above, but not to the sides, or based on the clear column name pattern for the better generalizability like with purrr and map2_int.


Solution

  • Assuming that you only have those columns, something like this should suffice:

    df3$eePD1 <- ifelse(is.na(df3$eePD1) & df3$Class == "CORP", df3$wePD1, df3$eePD1)
    df3$eePD2 <- ifelse(is.na(df3$eePD2) & df3$Class == "CORP", df3$wePD2, df3$eePD2)
    df3$eePD3 <- ifelse(is.na(df3$eePD3) & df3$Class == "CORP", df3$wePD3, df3$eePD3)
    

    Before:

      Scenario Rating  Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
    1    Basic     1C   CORP  0.51  0.74  0.43    NA    NA    NA
    2    Basic     2A   CORP  0.41  0.01  0.23  0.37  0.06  0.81
    3    Basic     2B   CORP  0.58  0.28  0.04  0.96  0.29  0.85
    4    Basic     2C   CORP  0.28  0.92  0.62  0.22  0.22  0.78
    5    Basic     3A RETAIL  0.68  0.48  0.71    NA    NA    NA
    

    After:

      Scenario Rating  Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
    1    Basic     1C   CORP  0.51  0.74  0.43  0.51  0.74  0.43
    2    Basic     2A   CORP  0.41  0.01  0.23  0.37  0.06  0.81
    3    Basic     2B   CORP  0.58  0.28  0.04  0.96  0.29  0.85
    4    Basic     2C   CORP  0.28  0.92  0.62  0.22  0.22  0.78
    5    Basic     3A RETAIL  0.68  0.48  0.71    NA    NA    NA
    

    If you have more columns which follow this pattern then you would probably want to generalise it. But for just three columns I'd just take this approach.

    A more general solution:

    for (target in grep("ee", colnames(df3), value=TRUE)) {
      source <- sub("^ee", "we", target)
      df3[[target]] <- ifelse(
        is.na(df3[[target]]) & df3$Class == "CORP",
        df3[[source]],
        df3[[target]]
      )
    }