Search code examples
rdata.tablebinning

How to create columns by numerical intervals from values ​of several columns?


Create columns by numerical intervals from integer values. 6 intervals (columns) partitioned in 15 from 1 to 90.

The new columns will show the number of occurrences for that interval. All input columns are taken as reference values. Some reference values are NA.

These are the input columns

col1 col2 col3 col4 col5 col6
11   20   22   54   73   86
1    32   64   69   NA   NA

Expected output

col1 col2 col3 col4 col5 col6  1to15 16to30 31to45 46to60 61to75 76to90
11   20   22   54   73   86      1      2      0      1      1      1
1    32   64   69   NA   NA      1      0      1      0      2      0

I've seen some examples using cut() but I can't adapt it to the above


Solution

  • Here is a solution using using cut:

    tmp <- 
    t(apply(
      df, # assume df is your dataframe or matrix
      1, 
      function(x) table(cut(x, seq(1, 91, 15), right = FALSE))
    ))
    
    cbind(df, tmp)
    #  col1 col2 col3 col4 col5 col6 [1,16) [16,31) [31,46) [46,61) [61,76) [76,91)
    #1   11   20   22   54   73   86      1       2       0       1       1       1
    #2    1   32   64   69   NA   NA      1       0       1       0       2       0
    

    If you want the columns to be named differently cut has a labels argument you can modify.