Search code examples
rdataframedplyrdatatable

Transform two subtiles using sum and table summary in R


I have two lists such as ;

type1 0 0 0 0 0 2 5 5 5 1 1 3
type2 0 0 0 0 0 1 5 2 3 0 0 1

here is the dput format if it can helps:

list(type1 = c(0,0,0,0,0,2,5,5,5,1,1,3), 
     type2 = c(0,0,0,0,0,1,5,2,3,0,0,1))

And I would like to transform it in dataframe such as:

Nb type   Value
1  type1  2
2  type1  1
3  type1  1
4  type1  0
5+ type1  3
1  type2  0
2  type2  1
3  type2  1
4  type2  0
5+ type2  10

where df$type==type1 corresponds of the number of Nb 1,2,3,4 or >=5 and where df$type==type corresponds of the number of sum of Nb 1,2,3,4 or >=5 of its correspondents in type1 list.


For example: in type1 we see 3 numbers >=5, then I had the row:

Nb type   Value
5+ type1  3

, among those 3, I add the sum if type2, which is 5+2+3 = 10, then I add:

Nb type   Value
5+ type1  3
5+ typ2   10 

Does someone have an idea?


Solution

  • An option is to get the table on the 'type1' element after converting to a factor with levels specified as 0 to 5, do a group by sum of 'type2' elements where the group is the 'type1' element, stack them into two column data.frame, and add the 'type' column as well and rbind the list elements

    out <- do.call(rbind, Map(cbind, type = names(list1), lapply(setNames(list(table(factor(list1$type1, levels = 0:5)), tapply(list1$type2, factor(list1$type1, levels = 0:5), FUN = function(x) sum(x, na.rm = TRUE))), names(list1)), stack))
    )
    row.names(out) <- NULL
    out$values[is.na(out$values)] <- 0
    subset(out[c(3, 1, 2)], ind != 0)
    

    -output

       ind  type values
    2    1 type1      2
    3    2 type1      1
    4    3 type1      1
    5    4 type1      0
    6    5 type1      3
    8    1 type2      0
    9    2 type2      1
    10   3 type2      1
    11   4 type2      0
    12   5 type2     10
    

    Or using forcats

    library(forcats)
    library(dplyr)
    grp <- factor(with(list1, fct_collapse(as.character(type1), 
           `>=5` = as.character(type1)[type1 >=5])), levels = c(0:4, ">=5"))
    v1 <- table(grp)
    v2 <- tapply(list1$type2, grp, FUN = sum)
    bind_rows(list(type1 = stack(v1)[2:1], type2 = stack(v2)[2:1]), .id = 'type') %>% 
         filter(ind != '0')