Search code examples
rskimr

Calculate percentages in skimr::skim_with


I am trying to add percentages of levels of factor to skimr::skim output. I tried to use the table function but it did not work as intended. I can I get the percentages of the different species in the correct format, similar to top_count?

library(skimr)
skim(iris)
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Data summary

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃
my_skim <- skim_with(factor=sfl(pct = ~prop.table(table(.))))
my_skim(iris)
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Data summary

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts pct
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃

Created on 2022-02-27 by the reprex package (v2.0.1)


Solution

  • We can paste (str_c) to create a single string

    library(skimr)
    my_skim <- skim_with(factor=sfl(pct = ~{
         prt <- prop.table(table(.))
         val <- sprintf("%.2f", prt)
         nm1 <- tolower(substr(names(prt), 1, 3))
          stringr::str_c(nm1, val, sep = ": ", collapse = ", ")
          })
    )
    

    -testing

    > my_skim(iris)
    ── Data Summary ────────────────────────
                               Values
    Name                       iris  
    Number of rows             150   
    Number of columns          5     
    _______________________          
    Column type frequency:           
      factor                   1     
      numeric                  4     
    ________________________         
    Group variables            None  
    
    ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      skim_variable n_missing complete_rate ordered n_unique top_counts                pct                            
    1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50 set: 0.33, ver: 0.33, vir: 0.33
    
    ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
    1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
    2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
    3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
    4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃