Search code examples
rdataframedata.tablemeltunfold

R data.table: subsetting data.table/dataframe based on size of row value


This is a basic question, but I'm stumped:

I have the following R data.table:

library(data.table)
DT <- fread('unique_point biased    data_points   team   groupID                                                                                                           
 up1          FALSE     3             A      xy28352                                                                                                                 
 up1          TRUE      4             A      xy28352                                                                                                                 
 up2          FALSE     1             A      xy28352                                                                                                                  
 up2          TRUE      0             X      xy28352                                                                                                                  
 up3          FALSE     12            Y      xy28352                                                                                                                 
 up3          TRUE      35            Z      xy28352')

which prints out as

> DT
   unique_point biased data_points team groupID
1:          up1  FALSE           3    A xy28352
2:          up1   TRUE           4    A xy28352
3:          up2  FALSE           1    A xy28352
4:          up2   TRUE           0    X xy28352
5:          up3  FALSE          12    Y xy28352
6:          up3   TRUE          35    Z xy28352

The values for the column team are letters A to Z, 26 possibilities. At the moment. If I count the row values with this code:

DT[, counts := .N, by=c("team")]

which gives

> DT
   unique_point biased data_points team groupID counts
1:          up1  FALSE           3    A xy28352      3
2:          up1   TRUE           4    A xy28352      3
3:          up2  FALSE           1    A xy28352      3
4:          up2   TRUE           0    X xy28352      1
5:          up3  FALSE          12    Y xy28352      1
6:          up3   TRUE          35    Z xy28352      1

I would like to create 26 new columns in DT which gives the size of each team, A, B, C, etc.

The resulting data.table would look like:

> DT
   unique_point biased data_points team groupID    A   B   C ... Z
1:          up1  FALSE           3    A xy28352    3   0   0 ... 1
2:          up1   TRUE           4    A xy28352    3   0   0 ... 1
3:          up2  FALSE           1    A xy28352    3   0   0 ... 1
4:          up2   TRUE           0    X xy28352    3   0   0 ... 1
5:          up3  FALSE          12    Y xy28352    3   0   0 ... 1
6:          up3   TRUE          35    Z xy28352    3   0   0 ... 1

I'm not sure how one does this with data.table syntax..

EDIT: I'm happy to do this with base R and dplyr as well.


Solution

  • What about plyr, is that ok?

    library(data.table)
    library(plyr)
    
    DT <- fread('unique_point biased    data_points   team   groupID                                                                                                           
                up1          FALSE     3             A      xy28352                                                                                                                 
                up1          TRUE      4             A      xy28352                                                                                                                 
                up2          FALSE     1             A      xy28352                                                                                                                  
                up2          TRUE      0             X      xy28352                                                                                                                  
                up3          FALSE     12            Y      xy28352                                                                                                                 
                up3          TRUE      35            Z      xy28352')
    
    ldply(LETTERS, function(x){
      n <- nrow(DT[team == as.character(x),])
      DT[, as.character(x) := n]
      return(DT[team == x,])
    })
    
    > DT
       unique_point biased data_points team groupID A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    1:          up1  FALSE           3    A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
    2:          up1   TRUE           4    A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
    3:          up2  FALSE           1    A xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
    4:          up2   TRUE           0    X xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
    5:          up3  FALSE          12    Y xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
    6:          up3   TRUE          35    Z xy28352 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1