Search code examples
rdata.tabledcast

group a column by values of second column based on condition from third column in data.table


I have a large data.table from which I want to group one column based on the value of another column that meets the criteria from a third column. I can do this with a loop but I wonder if it can be done in data.table?

The table looks like this:

    Group Col1  Col2
1:     A    1   0.0
2:     A    2   0.1
3:     A    3   0.2
4:     A    4   0.5
5:     A    5   0.9
6:     B    6   0.0
7:     B    7   0.2
8:     B    8   0.4
9:     B    9   0.9
10:    B   10   1.0

What I need is for each Group get the value in the row of Col1 where Col2 is the closest to 0.5. Col2 is a cumulative value that can range from 0 to 1. The expected result is:

    Group Col1
1:     A    4
2:     B    8

Can this be done in data.table?. I have struggle to do this so any input or guidance will be greatly appreciated. Here is data.table above

DAT=data.table(Group=c(rep("A",5),rep("B",5)),Col1=1:10,Col2=c(0,.1,.2,.5,.9,0,.2,.4,.9,1))

Solution

  • After grouping by 'Group', take the absolute difference of 'Col2' with 0.5, get the index o the minimum value (which.min) and use that to subset the 'Col1'

    DAT[, .(Col1 = Col1[which.min(abs(Col2 - 0.5))]), Group]
    #   Group Col1
    #1:     A    4
    #2:     B    8