Search code examples
rstringrtm

How to produce summary table?


Suppose I have the following table named df.

A; B
A; B
B; C; D
C; D; E; F

Weight for each element of the row is computed as: wt[i] <- 1/lengths(str_split(df[1],";")). Ideally, the above table weights attached to each elements seems the following.

0.5*A; 0.5*B
0.5*A; 0.5*B
0.333*B; 0.333*C; 0.333*D
0.25*C; 0.25*D; 0.25*E; 0.25*F

So, my intention is to have a table where the frequencies are the sum of the weights. The output I need is the following:

A     B     C     D      E     F
1  1.333 0.583  0.583  0.25  0.25

Any help is highly appreciated!


Solution

  • Here is an option in base R. After splitting the first column by ; followed by zero or more spaces (\\s*), get the lengths of list and find the sum using tapply

    lst1 <- strsplit(df[[1]], ";\\s*")
    l1 <- lengths(lst1)
    tapply(rep(1/l1, l1), unlist(lst1), sum)
    #        A         B         C         D         E         F 
    #1.0000000 1.3333333 0.5833333 0.5833333 0.2500000 0.2500000 
    

    Or in tidyverse, we can use separate_rows to split up the column and then do a group by summarise

    library(tibble)
    library(dplyr)
    library(tidyr)
    df %>%
        mutate(rn = row_number()) %>%
        separate_rows(v1) %>%
        add_count(rn) %>% 
        mutate(n = 1/n) %>% 
        group_by(v1) %>% 
        summarise(n = sum(n))
    # A tibble: 6 x 2
    #  v1        n
    #  <chr> <dbl>
    #1 A     1    
    #2 B     1.33 
    #3 C     0.583
    #4 D     0.583
    #5 E     0.25 
    #6 F     0.25 
    

    data

    df <- structure(list(v1 = c("A; B", "A; B", "B; C; D", "C; D; E; F"
    )), class = "data.frame", row.names = c(NA, -4L))