Search code examples
rdataframegroupinghistogramintervals

How to sum one column values and group them by intervals from another column


I'm newbie to R and have a data frame with 25k rows and would like to group the sum of "Freq" inputs within a range of "Var1" (let's say from 5 to 5).

Idea is to have less rows and create a histogram.

Here are 20 rows for simplicity:

Var1 <- c(0:19)
Freq <- c(289, 370, 2295, 2691, 2206, 1624, 1267, 1076, 971, 889, 891, 834, 866, 780, 794, 809, 772, 740, 742, 734)

df <- data.frame(Var1, Freq)

Here is what I would expect:

Var1_intervals <- c("0 - 4", "5 - 9", "10 - 14", "15-19")
Freq_sum <- c(7851, 5837, 4165, 3797)

df_2 <- data.frame(Var1_intervals, Freq_sum)

Solution

  • You can use aggregate and cut to sum up per interval.

    aggregate(df["Freq"], list(cut(df$Var1, (0:4)*5, right = FALSE)), sum)
    #  Group.1 Freq
    #1   [0,5) 7851
    #2  [5,10) 5827
    #3 [10,15) 4165
    #4 [15,20) 3797