Search code examples
rdata.tablecut

Using data.table and cut to split a variable into groups with equal observations


I have a question which is very simple. With the answers to more complicated related questions I have not been able to figure out the right syntax.

I have a data.table which looks as follows:

   dat <- read.table(
  text = "A   B   C   D   E   F   G   H   I   J
  A   0   1   1   1   0   1   0   1   1   1
  B   1   0   0   0   1   0   1   0   0   2
  C   0   0   0   1   1   0   0   0   0   3
  D   1   0   1   0   0   1   0   1   0   4
  E   0   1   0   1   0   1   1   0   1   5
  F   0   0   1   0   0   0   1   0   0   6
  G   0   1   0   1   0   0   0   0   0   7
  H   1   0   1   0   0   1   0   0   0   8
  I   0   1   0   1   1   0   1   0   0   9
  J   1   0   1   0   0   1   0   1   0   9",
  header = TRUE
)

Now I would like to use data.table to create a variable called Jcat to divide variable J into 3 categories with more or less equal amount of observations, simply:

   dat <- read.table(
  text = "A   B   C   D   E   F   G   H   I   J Jcat
  A   0   1   1   1   0   1   0   1   1   1   1
  B   1   0   0   0   1   0   1   0   0   2   1
  C   0   0   0   1   1   0   0   0   0   3   1
  D   1   0   1   0   0   1   0   1   0   4   2
  E   0   1   0   1   0   1   1   0   1   5   2
  F   0   0   1   0   0   0   1   0   0   6   2
  G   0   1   0   1   0   0   0   0   0   7   3
  H   1   0   1   0   0   1   0   0   0   8   3
  I   0   1   0   1   1   0   1   0   0   9   3
  J   1   0   1   0   0   1   0   1   0   9   3",
  header = TRUE
)

I am struggling with the syntax.

What would be the simplest way to do this?


Solution

  • We can specify the number of breaks in breaks argument of cut

    library(data.table)
    n <- 3
    setDT(dat)[, Jcat := as.integer(cut(J, breaks = n))]