Search code examples
rdplyrplyrrlangtidyeval

How to specify a column name in ddply via character variable?


I have a tibble/dataframe with

sample_id     condition     state
---------------------------------
sample1       case          val1
sample1       case          val2
sample1       case          val3
sample2       control       val1
sample2       control       val2
sample2       control       val3

The dataframe is generated within a for loop for different states. Hence, every dataframe has a different name for the state column.

I want to group the data by sample_id and calculate the median of the state column such that every unique sample_id has a single median value. The output should be like below...

sample_id     condition     state
---------------------------------
sample1       case          median
sample2       control       median

I am trying the command below; it is working if give the name of the column, but I am not able to pass the name via the state character variable. I tried ensym(state) and !!ensym(state), but they all are throwing errors.

ddply(dat_state, .(sample_id), summarize,  condition=unique(condition), state_exp=median(ensym(state)))

Solution

  • Thank you all for putting effort into answering my question. With your suggestions, I have found the solution. Below is the code to what I was trying to achieve by grouping sample_id and condition and passing state through a variable.

    state_mark <- c("pPCLg2", "STAT1", "STAT5", "AKT")
    
    for(state in state_mark){
        dat_state <- dat_clust_stim[,c("sample_id", "condition", state)]
    
        # I had to use !!ensym() to convert a character to a symbol.
        dat_med <- group_by(dat_state, sample_id, condition) %>% 
                   summarise(med = median(!!ensym(state)))
    
        dat_med <- ungroup(dat_med)
        x <- dat_med[dat_med$condition == "case", "med"]
        y <- dat_med[dat_med$condition == "control", "med"]
        t_test <- t.test(x$med, y$med)
    }