I have the following issue: for one Rating
, the values in the 3 numerical columns out of 6 are missing.
I would like the missing values to be replaced by the values from the cells that are situated to the left from the column with the missing values. The column names have a clear pattern - they all end with PD1
, PD2
or PD3
, which can be used as id.
Example: the value in the column eePD1
is missing for the Rating
1C, Basic Scenario
and CORP Class
.
It should be replaced by the value in the column wePD1
ONLY if the Class
is CORP. (RETAIL Class
should not be touched).
I.e, the value in eePD1=wePD1 if the value in eePD1 was NA and Class=CORP.
Scenario | Rating | Class | wePD1 | wePD2 | wePD3 | eePD1 | eePD2 | eePD3 |
---|---|---|---|---|---|---|---|---|
Basic | 1C | CORP | 0.51 | 0.74 | 0.43 | NA | NA | NA |
Basic | 2A | CORP | 0.41 | 0.01 | 0.23 | 0.37 | 0.06 | 0.81 |
Basic | 2B | RETAIL | 0.68 | 0.48 | 0.71 | NA | NA | NA |
Desirable result: if value from the cell in the column starting from ee
is NA
for the CORP Rating
and any Class
and Scenario
, replace NA
with the value from the column starting from we
and ending like the name of the column starting from ee
, for example PD1
, PD2
or PD3
.
Scenario | Rating | Class | wePD1 | wePD2 | wePD3 | eePD1 | eePD2 | eePD3 |
---|---|---|---|---|---|---|---|---|
Basic | 1C | CORP | 0.51 | 0.74 | 0.43 | 0.51 | 0.74 | 0.43 |
Basic | 2A | CORP | 0.41 | 0.01 | 0.23 | 0.37 | 0.06 | 0.81 |
Basic | 2B | RETAIL | 0.68 | 0.48 | 0.71 | NA | NA | NA |
Here is the reproducible code chunk:
df3=structure(list(Scenario = c("Basic", "Basic", "Basic", "Basic", "Basic"
), Rating = c("1C", "2A", "2B", "2C", "3A"), Class = c("CORP",
"CORP", "CORP", "CORP", "RETAIL"), wePD1 = c(0.51,
0.41, 0.58, 0.28,
0.68), wePD2 = c(0.74, 0.01,
0.28, 0.92, 0.48
), wePD3 = c(0.43, 0.23, 0.04,
0.62, 0.71), eePD1 = c(NA, 0.37,
0.96, 0.22, NA
), eePD2 = c(NA, 0.06, 0.29, 0.22,
NA), eePD3 = c(NA, 0.81, 0.85,
0.78, NA)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
The second solution option would be count three columns to the left, but it is not elegant or generalizable in case I were to have more columns...
So far, I could find only a few dplyr
examples that would replace the value to the one below/above, but not to the sides, or based on the clear column name pattern for the better generalizability like with purrr
and map2_int
.
Assuming that you only have those columns, something like this should suffice:
df3$eePD1 <- ifelse(is.na(df3$eePD1) & df3$Class == "CORP", df3$wePD1, df3$eePD1)
df3$eePD2 <- ifelse(is.na(df3$eePD2) & df3$Class == "CORP", df3$wePD2, df3$eePD2)
df3$eePD3 <- ifelse(is.na(df3$eePD3) & df3$Class == "CORP", df3$wePD3, df3$eePD3)
Before:
Scenario Rating Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
1 Basic 1C CORP 0.51 0.74 0.43 NA NA NA
2 Basic 2A CORP 0.41 0.01 0.23 0.37 0.06 0.81
3 Basic 2B CORP 0.58 0.28 0.04 0.96 0.29 0.85
4 Basic 2C CORP 0.28 0.92 0.62 0.22 0.22 0.78
5 Basic 3A RETAIL 0.68 0.48 0.71 NA NA NA
After:
Scenario Rating Class wePD1 wePD2 wePD3 eePD1 eePD2 eePD3
1 Basic 1C CORP 0.51 0.74 0.43 0.51 0.74 0.43
2 Basic 2A CORP 0.41 0.01 0.23 0.37 0.06 0.81
3 Basic 2B CORP 0.58 0.28 0.04 0.96 0.29 0.85
4 Basic 2C CORP 0.28 0.92 0.62 0.22 0.22 0.78
5 Basic 3A RETAIL 0.68 0.48 0.71 NA NA NA
If you have more columns which follow this pattern then you would probably want to generalise it. But for just three columns I'd just take this approach.
A more general solution:
for (target in grep("ee", colnames(df3), value=TRUE)) {
source <- sub("^ee", "we", target)
df3[[target]] <- ifelse(
is.na(df3[[target]]) & df3$Class == "CORP",
df3[[source]],
df3[[target]]
)
}