Search code examples
rmeanmedian

Calculate median or mean depending on the value of a column


I'm trying to calculate the median or mean depending on the value of a column.

Imagine the following DF

DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")

I want to fill the "median_mean" column with the median or the mean of the 3 samples per row, depending on the frequency column. If Freq is bigger or equal to 10, use median, else, use mean.

Bear in mind that the sample won´t always be 3, so I can´t use columns (2:4). But they names will always be sample_X.

Any one could give me a hand?


Solution

  • DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
    colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
    
    DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))
    

    Explanation

    We apply both median and mean to the relevant columns using:

    • apply(DF[grep("sample_", names(DF))], 1L, median)

    and

    • apply(DF[grep("sample_", names(DF))], 1L, mean)

    but we return only the value we want using the vectorized form of the ternary operator, ifelse.

    The code also works for any number of columns named sample_X because we generalized the selection of columns simply searching their names with grep("sample_", names(DF)).