R: Summary statistics for groups / subsets within panel data - code and layout

I have the following data (posted in the comment):

Now I would like to have summary statistics. I only need the mean and the number of observations. The summary statistics should be grouped according to the rating of the product and there should be a difference in means test for the means of the ratings 1 and 5. In the end it should look like this:

I came across the describeBy function. However, the problem there is that I could not end up with my desired layout (see picture) and I could not include the difference in the means test for the means of the rating 1 and 5 and the means for the whole sample.

Further, I also tried to use the stargazer package. But there I had similar problems.

Could anyone help me here?

Solution

You can use this dplyr/tidy pipeline:

library(tidyverse)

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )

Output:

  name         Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
  <chr>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>                    <dbl>       <dbl>
1 mean_Revenue     200   400       250     300     200                    0.742       289. 
2 mean_Costs        45    26.7      40      30      20                    0.196        33.3
3 mean_Age           2     3         4       4       2                    0.552         3  
4 Observations       2     3         2       1       1                   NA            NA

Updated 4/22/22

Original answer did not limit the anova to Ratings 1 and 5

# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
  y_ = y[rating %in% ratings]
  x_ = rating[rating %in% ratings]
  anova(lm(y_~x_))$`Pr(>F)`[[1]]
}

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )