Search code examples
rsummarization

R Summarizing data.frame with in last row with characters


i have a data.frame consisting of character columns and numeric columns. Now I would like to calculate the mean of the numeric columns and append the results to the end of the dataframe.

class1  1    2    5
class2  2    3    6
class3  2    3    2

to

class1  1    2    1
class2  2    3    6
class3  2    3    2
mean    1.6  2.6  3

I tried so with colMeans, but this conflicts with the character column and I get the following error:

Error in colMeans(data, na.rm = FALSE) : 'x' must be numeric

I also tried to restrict colMeans to parts of the data.frame with data[2:4], but then I struggle to append the string, as it doesn't have the same length as the original data.frame.

Thanks for your help.


Solution

  • I agree with the above comment that sticking them at the end of your data frame doesn't seem like a good idea.

    Anyway, you could take this opportunity to expand your R-pertoire with rapply

    str(iris)
    # 'data.frame':  150 obs. of  5 variables:
    #   $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    # $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    # $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    # $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    # $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    
    summary(iris)
    # Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
    # Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50  
    # 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
    # Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50  
    # Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
    # 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
    # Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500                  
    
    rapply(iris, mean, classes = c('numeric','integer'))
    # Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    # 5.843333     3.057333     3.758000     1.199333 
    

    But if you had to join them, you could do

    tmp <- rapply(iris, mean, classes = c('numeric','integer'))
    rbind(iris, tmp[match(names(iris), names(tmp))])
    
    tail(rbind(iris, tmp[match(names(iris), names(tmp))]), 5)
    #     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    # 147     6.300000    2.500000        5.000    1.900000 virginica
    # 148     6.500000    3.000000        5.200    2.000000 virginica
    # 149     6.200000    3.400000        5.400    2.300000 virginica
    # 150     5.900000    3.000000        5.100    1.800000 virginica
    # 151     5.843333    3.057333        3.758    1.199333      <NA>
    

    I regret coining R-pertoire already