I have a nested data.frame - df_nested
, there one of column contains df
:
df <- tibble(ID_Value = 1:8,
xyz001 = c("text4", NA, NA, NA, NA, NA, NA, "text2"),
xyz002 = c(NA, NA, NA, "text3", "text1", NA, NA, NA),
xyz003 = c(NA, "text1", NA, NA, "text2", NA, "text2", NA))
I want to find a way, how to mutate this df, on these requirements:
mutate(across(matches("\\d")
my attempt:
df_nested <- df_nested %>%
mutate(df = map(data, ~.x %>%
mutate(across(matches("\\dd"), function (x) {
conditions (ifelse, case_when or other)
...}
Also, should we better use across()
, or is vars()
still a good way to do it as well?
Thank you in advance.
Expected Output
df <- tibble(ID_Value = 1:8,
xyz001 = c("text4", NA, NA, NA, NA, NA, NA, NA),
xyz002 = c(NA, NA, NA, "text3", NA, NA, NA, NA),
xyz003 = c(NA, NA, NA, NA, "text2", NA, "text2", NA))
factor
type to specify the order you want.Consider this function
max_only <- function(x, lvls) {
fct <- droplevels(factor(x, lvls))
`[<-`(x, as.integer(fct) != length(levels(fct)), NA_character_)
}
You can then specify any order you want
> max_only(c("apple", "banana", NA_character_), c("banana", "apple"))
[1] "apple" NA NA
> max_only(c("apple", "banana", NA_character_), c("apple", "banana"))
[1] NA "banana" NA
df %>%
mutate(across(matches("\\d"), max_only, c("tier1", "tier2", "tier3", "tier4")))
Output (this one looks more like your expected output)
# A tibble: 8 x 4
ID_Value xyz001 xyz002 xyz003
<int> <chr> <chr> <chr>
1 1 tier4 NA NA
2 2 NA NA NA
3 3 NA NA NA
4 4 NA tier3 NA
5 5 NA NA tier2
6 6 NA NA NA
7 7 NA NA tier2
8 8 NA NA NA
df %>%
mutate(as.data.frame(t(apply(
across(matches("\\d")), 1L,
max_only, c("tier1", "tier2", "tier3", "tier4")
))))
Output
# A tibble: 8 x 4
ID_Value xyz001 xyz002 xyz003
<int> <chr> <chr> <chr>
1 1 tier4 NA NA
2 2 NA NA tier1
3 3 NA NA NA
4 4 NA tier3 NA
5 5 NA NA tier2
6 6 NA NA NA
7 7 NA NA tier2
8 8 tier2 NA NA
[<-
is almost equivalent to x[...] <- y; x
. If ...
is a logical vector (i.e. TRUE/FALSE), then values in x
indexed by TRUE will be replaced by y
. For example,
> x <- c("a", "b" ,"c")
> `[<-`(x, c(FALSE, TRUE, TRUE), NA_character_)
[1] "a" NA NA
> x[c(FALSE, TRUE, TRUE)] <- NA_character_; x
[1] "a" NA NA
NA_character_
is the NA value of a character type.
as.integer(fct) != length(levels(fct))
returns a logical vector of the same length as fct
. TRUE indexes positions where the values of fct
are not the highest level, FALSE indexes the opposite, and NA indexes NAs. For example, assume that fct
looks like this
> x <- c("apple", "banana", NA)
> fct <- droplevels(factor(x, c("apple", "banana", "pear")))
> fct
[1] apple banana <NA>
Levels: apple banana
Then, you can see that
> as.integer(fct) != length(levels(fct))
[1] TRUE FALSE NA
All together, it just means that we assign NA_character_
s to values that are not equal to the highest level but keep NAs unchanged.
[<-(x, as.integer(fct) != length(levels(fct)), NA_character_)