I'm trying to calculate the median or mean depending on the value of a column.
Imagine the following DF
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
I want to fill the "median_mean" column with the median or the mean of the 3 samples per row, depending on the frequency column. If Freq is bigger or equal to 10, use median, else, use mean.
Bear in mind that the sample won´t always be 3, so I can´t use columns (2:4). But they names will always be sample_X.
Any one could give me a hand?
DF <- data.frame("name", 1:20, 3, 2:5, 0, 8:12)
colnames(DF)<- c("name","sample_1","sample_2", "sample_3", "median_mean", "Frequence")
DF$median_mean = ifelse(DF$Frequence>=10, apply(DF[grep("sample_", names(DF))], 1L, median), apply(DF[grep("sample_", names(DF))], 1L, mean))
We apply both median
and mean
to the relevant columns using:
apply(DF[grep("sample_", names(DF))], 1L, median)
and
apply(DF[grep("sample_", names(DF))], 1L, mean)
but we return only the value we want using the vectorized form of the ternary operator, ifelse
.
The code also works for any number of columns named sample_X
because we generalized the selection of columns simply searching their names with grep("sample_", names(DF))
.