Search code examples
rsummaryskimr

Using skimr to create a data frame of summary statistics


I have recently come across the package called skimr which helps create useful summary statistics. I have written the following codes to extract summary stats only on numerical columns. My first question is, is there a more direct way that skimr permits to specify the type of variables for which I want summary stats? My second question is, what does append == TRUE actually achieve when I write the my_skim "closure"?

library(skimr)
library(dplyr)

### Creating an example dataset 

test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE), 
                       "Firm" = head(LETTERS, 5), 
                       "Exporter"= sample(c("Yes", "No"), 20, replace = TRUE), 
                       "Revenue" = sample(100:200, 20, replace = TRUE),
                         stringsAsFactors =  FALSE)

test.df1 <- rbind(test.df1, 
                    data.frame("Year" = c(2018, 2018),
                               "Firm" = c("Y", "Z"),
                               "Exporter" = c("Yes", "No"),
                               "Revenue" = c(NA, NA)))

test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))

### Using skimr package to extract summary stats

my_skim <- skim_with(numeric = sfl(minimum = min, maximum = max, hist = NULL), append = TRUE)

test.df1_skim1 <- test.df1 %>% 
 group_by(Year) %>% 
  my_skim() %>% 
   filter (skim_type != "character") %>% 
    select(-starts_with("character"))

Solution

  • If you only want summary of the numeric variables you could set all the other types to NULL or else you could run the skim and use yank() to get subtable for a type. From https://docs.ropensci.org/skimr/articles/skimr.html#reshaping-the-results-from-skim-

      skim(Orange) %>% yank("numeric")
    
    

    The append option lets you either replace the default statistics or append to the defaults.