Hello I have a large dataframe where I have need to update or fill in the missing values. The condition is that the correct value should be in respect to row name matching to a column name. For example:
df <- data.frame(
ID = c("x", "y", "z"),
x = c("1", "0.45", "0.47"),
y = c("0.45", "1", "0.65"),
z = c("XXXX", "XXXX", "1")
)
ID x y z
1 x 1 0.45 XXXX
2 y 0.45 1 XXXX
3 z 0.47 0.65 1
Above, the correct values for xxxx
should be 0.47 and 0.65, respectively. Because x col z row has 0.47. Then y col z row is 0.65.The dataframe then looks like this:
ID x y z
1 x 1 0.45 0.47
2 y 0.45 1 0.65
3 z 0.47 0.65 1
This way all the elements row wise or column wise would correspond to matching rowname and column name.
I have looked into these posts but I couldn't use them to advance my solutions:
We could do this in base R
i1 <- upper.tri(df[-1])
i2 <- lower.tri(df[-1])
df[-1][i1] <- df[-1][i2]
-output
> df
ID x y z
1 x 1 0.45 0.47
2 y 0.45 1 0.65
3 z 0.47 0.65 1
Or use coalesce
library(dplyr)
df[-1][upper.tri(df[-1])] <- coalesce(na_if(df[-1][upper.tri(df[-1])],
"XXXX"), df[-1][lower.tri(df[-1])])
Or may convert to 'long' format and then reshape back to wide format after fill
ing
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(across(everything(), ~ na_if(.x, "XXXX"))) %>%
pivot_longer(cols = -ID) %>%
group_by(ID2 = str_c(pmin(ID, name), pmax(ID, name))) %>%
fill(value, .direction = "downup") %>%
ungroup %>%
select(-ID2) %>%
pivot_wider(names_from = name, values_from = value)
-output
# A tibble: 3 × 4
ID x y z
<chr> <chr> <chr> <chr>
1 x 1 0.45 0.47
2 y 0.45 1 0.65
3 z 0.47 0.65 1