I have a question and hope that some of you can help me. The issue is this: for a given data frame that includes a vector y of length n and a factor f with k different levels, I want to assign a new variable z which has length k to the data frame, based on f.
Example:
df <- data.frame(y=rnorm(12), f=rep(1:3, length.out=12))
z <- c(-1,0,5)
Note that my real z
has been constructed to correspond to the unique factor levels, which is why length(z) = length(unique(df$f)
. I now want to create a vector of length n=12 that contains the value of z
that corresponds to the factor level f
. (Note: my real factor values are not ordered like in the above example, so just repeating the vector z
won't work),
Now, an obvious solution would be to create a vector f
outside the data frame, merge it with z
and then to use merge
. For instance,
newdf <- data.frame(z=z, f=c(1,2,3))
df <- merge(df, newdf, by="f")
However, I need to repeat this procedure several thousand times, and this merge
-solution seems like shooting with canons on microbes. Hence my question: there almost surely is an easier and more efficient way to do this, but I just don't know how. Could anyone point me in the right direction? I am looking for something like the "inverse" of aggregate
or by
.
assuming that the values in z correspond to the f levels
df <- data.frame(y=rnorm(12), f= sample(c("a","b","c"),12,replace=T))
z <- c(-1,0,5)
df$newz<-z[df$f]
In case this is not clear: this works because factors are stored under the covers as integers. When you index z with that vector of factors you are effectively indexing with the underlying integers, which point to the right z value for that factor value.