Search code examples
rdataframecategorical-data

how to find row Means of a data frame includes categorical variables. summing up numeric data of a category group?


a data of penicillin production including four treatment(A,B,C,D)'our columns' and five blocks'row'. I need to calculate sum and mean of each row separately. dataframe brings the variable in col and I cannot define variables of treatment A and sum it up. I wanna know how to write them the way that I can have 4 numbers in each row in order to calculate its mean and sum...

here is my code:

pencilline=c(89,88,97,94,84,77,92,79,81,87,87,85,87,92,89,84,79,81,80,88)
treatment=factor(rep(LETTERS[1:4],times=5))
block=sort(rep(1:5,times=4))
datap=data.frame(pencilline,block,treatment)
datap
 
datap_subset=unlist(lapply(datap,is.numeric))
datap_subset
pencilline      block  treatment 
      TRUE       TRUE      FALSE 
rowMeans(datap[,datap_subset])
 [1] 45.0 44.5 49.0 47.5 43.0 39.5 47.0 40.5 42.0 45.0 45.0 44.0 45.5 48.0 46.5 44.0 42.0 43.0 42.5 46.5

which gives false rowMeans.


Solution

  • Do you want this?

    library(dplyr)
    datap %>% group_by(block) %>%
      summarise(mean = mean(pencilline))
    
    # A tibble: 5 x 2
      block  mean
      <int> <dbl>
    1     1    92
    2     2    83
    3     3    85
    4     4    88
    5     5    82
    

    its baseR equivalent

    aggregate(pencilline ~ block, datap, mean)
    
      block pencilline
    1     1         92
    2     2         83
    3     3         85
    4     4         88
    5     5         82