Search code examples
rdplyrrep

Please help me understand why mutate does this


I have a dataframe "dat2" of 40 obs x 3 variables. I want to add a column "tx", based on 2 vectors: "treatments" (5 elements) and "ET" (4 elements).

treatments <- c("ctrl", "204", "226", "204+226", "blina")
ET <- c("10:1", "5:1", "2.5:1", "T only")

If I combine the vectors like this:

rep(rep(treatments, each=2), length(ET))

I get a vector of length = 40, as desired.

> rep(rep(treatments, each=2), length(ET))
 [1] "ctrl"    "ctrl"    "204"     "204"     "226"     "226"     "204+226"
 [8] "204+226" "blina"   "blina"   "ctrl"    "ctrl"    "204"     "204"    
[15] "226"     "226"     "204+226" "204+226" "blina"   "blina"   "ctrl"   
[22] "ctrl"    "204"     "204"     "226"     "226"     "204+226" "204+226"
[29] "blina"   "blina"   "ctrl"    "ctrl"    "204"     "204"     "226"    
[36] "226"     "204+226" "204+226" "blina"   "blina"  

However, if I use that same line inside mutate:

mutate(dat2, tx = rep(rep(treatments, each=2), length(ET)))

it doesn't work, as it seems to generate 400 elements:

Error: Column `tx` must be length 40 (the number of rows) or one, not 400

I know I could just work around by creating a vector with the reps and then using that vector to define 'tx' inside mutate, but I want to understand why 'rep' behaves differently inside mutate.

Thanks!!


Solution

  • The issue is that mutate expects the length of the output to be the same as the number of rows. If it is not that, it will throw error. We can wrap it in a list and then unnest to expand the list column

    library(dplyr)
    library(tidyr)
    dat2 %>%
        summarise(tx = list(rep(rep(treatments, each=2), length(ET))))  %>%
        unnest(c(tx))