Search code examples
rcut

even cuts based on another variable


How can we make cuts of 1 variable ensuring that the sum of another variable for these cuts is even?

eg.

I would like the sum of var2 to be more even between cuts

enter image description here

code:

library(data.table)
dt = data.table(var1=c(0.6,0.2,0.5,0.8,0.10,0.1,0.2,0.5,0.3,0.5),
                var2=c(20,400,350,50,100,490,1200,900,1850,70))
dt[,cuts:=cut(dt$var1,breaks=3)]
dt[,.(var2=sum(var2)),by=cuts]

Tks!


Solution

  • One way would be to create a vector that has your var1 values represented in proportion to their var2 values, and then use that vector to create equal bins, for example,

    library(data.table)
    library(Hmisc)
    
    dt = data.table(var1=c(0.6,0.2,0.5,0.8,0.10,0.1,0.2,0.5,0.3,0.5),
                    var2=c(20,400,350,50,100,490,1200,900,1850,70))
    
    dt[,var3:=round(var2/min(var2))]
    
    cc = rep(dt[,var1], dt[,var3])
    
    labs = cut2(cc, g=3, onlycuts = TRUE)
    
    dt[,cuts:=cut2(var1, cuts=labs)]
    
    dt[,.(var2=sum(var2)),by=cuts]
    
    #         cuts var2
    # 1: [0.5,0.8] 1390
    # 2: [0.1,0.3) 2190
    # 3:       0.3 1850