My searches on SO & elsewhere are coming up with interesting solutions to problems that have similar search terms but not my issue. Thought I found a solution, but the error is leaving me quite puzzled. I'm trying to learn tidyverse approaches better, but I appreciate any solution strategies.
Aim: Create new vector columns in a dataframe where each new vector is named from the factor level of an existing dataframe vector. The code solution should be dynamic so that it can be applied to factors with any number of levels.
Test data
df <- data.frame(x=c(1:5), y=letters[1:5])
Which produces as expected
> str(df)
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
> df
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
and when finished should look like
> df
x y a b c d e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA
Tidy for loop approach
library(tidyverse)
for (i in 1:length(levels(df$y))) {
df <- mutate(df, levels(df$y)[i] = NA)
}
but that gives me the following error:
> for (i in 1:length(levels(df$y))) {
+ df <- mutate(df, levels(df$y)[i] = NA)
Error: unexpected '=' in:
"for (i in 1:length(levels(df$y))) {
df <- mutate(df, levels(df$y)[i] ="
> }
Error: unexpected '}' in "}"
Troubleshooting, I removed the loop and simplified the mutate to see if it works in general, which it will with or without the quotation marks (note, I reran the test data to start fresh).
levels(df$y)[1]
> "a"
df <- mutate(df, a = NA)
df <- mutate(df, "a" = NA) # works the same as the previous line
> df
x y a
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA
Substituting the levels function back in, but without the loop returns the mutate error (note, I reran the test data to start fresh):
> df <- mutate(df, levels(df$y)[1] = NA)
Error: unexpected '=' in "df <- mutate(df, levels(df$y)[1] ="
I continue to get the same error is I try to use .data=df to specify the dataset or wrap as.character(), paste(), or paste0() around the levels function--which I picked up other various solutions online. Nor is R just being picky if I restructure the code using the %>% pipe.
What about the equal sign is unexpected with my levels code substitution (and potential newb mistakes)? Any assistance is greatly appreciated!
Posting solutions for others based on comments received, and so I can mark this question as solved. Please give up votes to @arg0naut91 and @Gregor for their solutions & guided help.
Test data
df <- data.frame(x=c(1:5), y=letters[1:5])
Solution 1: base R
@arg0naut91 provided an elegant base R solution:
df[, levels(df$y)] <- NA
df
x y a b c d e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA
Solution 2: using quo() and :=
@Gregor's guidance & useful links showed how some functions, and pretty much all of the tidyverse, does not evaluate objects as we might expect.
First test with a single new column:
df <- data.frame(x=c(1:5), y=letters[1:5]) # refresh test data
varlevel <- levels(df$y)[1] # where level 1=a
df <- mutate(df, !!varlevel := NA)
rm(varlevel) # cleanup
df
x y a
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA
Then put it into the for loop to capture each factor level as a new column:
df <- data.frame(x=c(1:5), y=letters[1:5]) # refresh test data
for (i in 1:length(levels(df$y))) {
+ varlevel <- levels(df$y)[i]
+ df <- mutate(df, !!varlevel := NA)
+ rm(varlevel) # cleanup
+ }
df
x y a b c d e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA