Search code examples
rreshape2

How to use a character vector of column names in the formula argument of dcast (reshape2)


Say I have a dataframe df with dozens of identifying variables (in columns) and only a few measured variables (also in columns).

To avoid repetitively typing all the variables for each argument, I assign the names of the identifying and measured df columns to df_id and df_measured, respectively. It's easy enough to input these vectors to shorten the argument inputs for melt...

df.m  <- melt(df, id.vars = df_id, measure.vars = df_measured)

... but I'm at a loss for how to enter the formula = argument in dcast using the same method to specify my id variables since it requires that the input point to numeric positions of the columns.

Do I have to make a vector of numeric positions similar to df_id and risk broken functionality of my program if my input columns change in order, or can I refer to them by name and somehow still get that to work in the formula = argument? Thanks.


Solution

  • You can use as.formula to construct a formula.

    Here's an example:

    library(reshape2)
    ## Example from `melt.data.frame`
    names(airquality) <- tolower(names(airquality))
    df_id <- c("month", "day")
    aq <- melt(airquality, id = df_id)
    
    ## Constructing the formula
    f <- as.formula(paste(paste(df_id, collapse = " + "), "~ variable"))
    
    ## Applying it....
    dcast(aq, f, value.var = "value", fun.aggregate = mean)