Search code examples
rdataframenapercentage

How to add a percentage column to a dataframe that includes entier rows of NA?


I have this dataframe

df = data.frame(name = c('a','b','c',NA,NA,'d','e',NA,NA,'f','g','h'),
                Freq = c(10,20,70,NA,NA,40,60,NA,NA,80,10,10) )

The output (I am looking for) would be something like that (I created the per column by hand in the following outcome to make my idea clear)

    name Freq  per
1     a   10   10%
2     b   20   20%
3     c   70   70%
4  <NA>   NA   NA  
5  <NA>   NA   NA
6     d   40   40%
7     e   60   60%
8  <NA>   NA   NA
9  <NA>   NA   NA
10    f   80   80%
11    g   10   10%
12    h   10   10%

so the percentage of a is a/(a+b+c). and the percentage of b is b/(a+b+c). and c is c/(a+b+c)

However : the percentage of d is d/(d+e) and e is e/(d+e)

The NAs are seperating between each reference group of percentages. I hope it's clear and thanks.


Solution

  • We could create a grouping column by taking the cumulative sum on a logical vector based on the NA values, then replace the 'Freq' where the value is not NA with the proportions, ungroup and remove the temporary 'grp' column

    library(dplyr)
    df %>% 
      group_by(grp = cumsum(is.na(name))) %>%
      mutate(per = replace(Freq, !is.na(Freq), 
        paste0(100*proportions(Freq[!is.na(Freq)]), "%"))) %>%
      ungroup %>%
      select(-grp)
    

    -output

    # A tibble: 12 × 3
       name   Freq per  
       <chr> <dbl> <chr>
     1 a        10 10%  
     2 b        20 20%  
     3 c        70 70%  
     4 <NA>     NA <NA> 
     5 <NA>     NA <NA> 
     6 d        40 40%  
     7 e        60 60%  
     8 <NA>     NA <NA> 
     9 <NA>     NA <NA> 
    10 f        80 80%  
    11 g        10 10%  
    12 h        10 10%