I have this sample data:
set.seed(25)
xx <- data.table(
year = 2015,
values = iris$Sepal.Length,
score = sample(1:8, nrow(iris), replace = TRUE))
The actual data contains many years and rows. I wanted to grouped the values
column using the cut()
function in base
R
but the result is different from the result generated by LibreOffice Calc (even in MS Office Excel) pivot. This is what I have done so far:
brks <- seq(0, ceiling(max(xx$values)), 0.5)
xx[, bins := cut(values, brks, ordered_result = TRUE)]
xx_binned <- dcast(xx, bins ~ year, length, value.var = "values")
xx_binned <- melt(xx_binned, id.vars = "bins", value.name = "value")
I started at 0
so that it will be consistent if I use different data. In the spreadsheet I also started at 0
as the starting number.
The result of the above codes is this:
bins variable value
1 (4,4.5] 2015 5
2 (4.5,5] 2015 27
3 (5,5.5] 2015 27
4 (5.5,6] 2015 30
5 (6,6.5] 2015 31
6 (6.5,7] 2015 18
7 (7,7.5] 2015 6
8 (7.5,8] 2015 6
This is the result of LibreOffice Calc:
values 2015
4-4.5 15
4.5-5 106
5-5.5 100
5.5-6 142
6-6.5 148
6.5-7 95
7-7.5 25
7.5-8 27
How can I make it the same? I am writing a function converting a spreadsheet tools into R function and I want it to be the same as in the output of the spreadsheet.
Thanks.
You have to sum up the score
not the number of cases to come to the same values.
aggregate(xx$score, list(cut(xx$values, brks, right=FALSE, ordered_result = TRUE)), sum)
# Group.1 x
#1 [4,4.5) 15
#2 [4.5,5) 106
#3 [5,5.5) 100
#4 [5.5,6) 142
#5 [6,6.5) 148
#6 [6.5,7) 95
#7 [7,7.5) 25
#8 [7.5,8) 27
Or updating your code:
library(data.table)
xx <- data.table(xx)
xx[, bins := cut(values, brks, right=FALSE, ordered_result = TRUE)]
dcast(xx, bins ~ year, sum, value.var = "score")
Data:
set.seed(25)
xx <- data.frame(
year = 2015,
values = iris$Sepal.Length,
score = sample(1:8, nrow(iris), replace = TRUE))
brks <- seq(0, ceiling(max(xx$values)), 0.5)