Search code examples
rmultiple-columnssubsetnames

How to perform functions on groups of columns in R


I would like to calculate row means for groups of columns in my dataset. My column names are specified as B1, B2, B3, etc and C1, C2, C3, etc. I have MANY columns and therefore would like to find code where I do not have to specify each individual column I would like to include in the analysis. My goal, for example, is to calculate the mean for all the "B" columns and then for all the "C" columns, etc. Is there a shortcut for this?

My current code for the data frame I9 is:

I9$meanB <- rowMeans(I9[,c("B1", "B2", "B3", "B4", "B5")], na.rm=TRUE) I9$meanC <- rowMeans(I9[,c("C1", "C2", "C3", "C4", "C5")], na.rm=TRUE) and so on and so on for all the columns. The code is working...just time consuming.

Any way for me to tell R to average all of the B columns without have to write out all the exact names? Thanks


Solution

  • The mean of all columns whose names contain Sepal:

    Base package:

    df <- iris[, grepl("Sepal", names(iris))]
    df$means <- rowMeans(df)
    

    Output, head(df):

      Sepal.Length Sepal.Width means
    1          5.1         3.5  4.30
    2          4.9         3.0  3.95
    3          4.7         3.2  3.95
    4          4.6         3.1  3.85
    5          5.0         3.6  4.30
    6          5.4         3.9  4.65