Binning data in R with the same output as in spreadsheet

I have this sample data:

set.seed(25)

xx <- data.table(
  year = 2015,
  values = iris$Sepal.Length,
  score = sample(1:8, nrow(iris), replace = TRUE))

The actual data contains many years and rows. I wanted to grouped the values column using the cut() function in base R but the result is different from the result generated by LibreOffice Calc (even in MS Office Excel) pivot. This is what I have done so far:

brks <- seq(0, ceiling(max(xx$values)), 0.5)
xx[, bins := cut(values, brks, ordered_result = TRUE)]
xx_binned <- dcast(xx, bins ~ year, length, value.var = "values")
xx_binned <- melt(xx_binned, id.vars = "bins", value.name = "value")

I started at 0 so that it will be consistent if I use different data. In the spreadsheet I also started at 0 as the starting number.

The result of the above codes is this:

     bins   variable value
1   (4,4.5] 2015     5
2   (4.5,5] 2015     27
3   (5,5.5] 2015     27
4   (5.5,6] 2015     30
5   (6,6.5] 2015     31
6   (6.5,7] 2015     18
7   (7,7.5] 2015     6
8   (7.5,8] 2015     6

This is the result of LibreOffice Calc:

values  2015
4-4.5   15
4.5-5   106
5-5.5   100
5.5-6   142
6-6.5   148
6.5-7   95
7-7.5   25
7.5-8   27

How can I make it the same? I am writing a function converting a spreadsheet tools into R function and I want it to be the same as in the output of the spreadsheet.

Thanks.

Solution

You have to sum up the score not the number of cases to come to the same values.

aggregate(xx$score, list(cut(xx$values, brks, right=FALSE, ordered_result = TRUE)), sum)
#  Group.1   x
#1 [4,4.5)  15
#2 [4.5,5) 106
#3 [5,5.5) 100
#4 [5.5,6) 142
#5 [6,6.5) 148
#6 [6.5,7)  95
#7 [7,7.5)  25
#8 [7.5,8)  27

Or updating your code:

library(data.table)
xx <- data.table(xx)
xx[, bins := cut(values, brks, right=FALSE, ordered_result = TRUE)]
dcast(xx, bins ~ year, sum, value.var = "score")

Data:

set.seed(25)

xx <- data.frame(
  year = 2015,
  values = iris$Sepal.Length,
  score = sample(1:8, nrow(iris), replace = TRUE))
brks <- seq(0, ceiling(max(xx$values)), 0.5)