My categorical variable, risk has three groups in it of: ADV, HHM and POV
I want get the mean these three groups for four continuous variables read.5
, read.6
, read.7
and read.8
which are reading scores of individuals over grades 5 to 8
which is the ,2:5
of my dataset and it's an old textbook example. I used the code below which is not correct apparently even though it is supposed to be correct according to the texbook example:
myrisk <- ddply(.data = MPLS[ ,2:5], .variables = .(MPLS$risk),
.fun = mean, na.rm = TRUE)
I had an error message for a piece of code earlier on of:
mymeans <- mean(MPLS[ ,2:5], na.rm = TRUE)
which when I googled it, the R software had changed and I had to find another to work out the means.
My questions are:
Is the ddply function which I am trying to use currently, from the plyr package been superseded in the same way that the old mean function has?
How do I get the mean of a categorical variable from the four columns? Whether with the same function or with something different?
Thank you
df<-data.frame(risk= rep(c("ADV","HHM","POV"),10),
read.5= rnorm(30,30),
read.4= rnorm(30,30),
read.3= rnorm(30,30),
read.2= rnorm(30,30))
> head(df)
# risk read.5 read.4 read.3 read.2
#1 ADV 30.78281 30.00721 29.80906 29.25936
#2 HHM 29.76175 29.63864 29.39256 29.40070
#3 POV 29.00964 30.48258 29.20662 28.77509
#4 ADV 29.60631 30.35032 32.00376 30.70374
#5 HHM 31.38653 30.28896 29.48756 30.32430
#6 POV 30.33102 30.40897 29.55796 30.10585
library(dplyr)
df %>% group_by(risk) %>% summarise_all(mean)
# A tibble: 3 x 5
# risk read.5 read.4 read.3 read.2
# <fct> <dbl> <dbl> <dbl> <dbl>
1 ADV 30.3 30.2 30.2 30.4
2 HHM 29.7 30.5 29.8 29.9
3 POV 29.3 30.2 29.9 30.2