I have the following data (posted in the comment):
Now I would like to have summary statistics. I only need the mean and the number of observations. The summary statistics should be grouped according to the rating of the product and there should be a difference in means test for the means of the ratings 1 and 5. In the end it should look like this:
I came across the describeBy function. However, the problem there is that I could not end up with my desired layout (see picture) and I could not include the difference in the means test for the means of the rating 1 and 5 and the means for the whole sample.
Further, I also tried to use the stargazer package. But there I had similar problems.
Could anyone help me here?
You can use this dplyr/tidy pipeline:
library(tidyverse)
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)
Output:
name Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean_Revenue 200 400 250 300 200 0.742 289.
2 mean_Costs 45 26.7 40 30 20 0.196 33.3
3 mean_Age 2 3 4 4 2 0.552 3
4 Observations 2 3 2 1 1 NA NA
# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
y_ = y[rating %in% ratings]
x_ = rating[rating %in% ratings]
anova(lm(y_~x_))$`Pr(>F)`[[1]]
}
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)