Search code examples
rbreakcut

R - group data by a vector of conditions and sum it up


I have no idea of how to group data by a column with several conditions. I want to have a flexible slider that sets the amount of groups and R calculated the sum of the rows in order to display them in a barplot. Here is what I did and I hope someone has an idea:

I have a set of data like this:

print(mydata)
Deviation   Fuel
-98893        10
-98032        20
-93958        30
-68953        40
-67829        50

etc, etc, etc..

At the end they will be displayed in a barplot. Now I want to group the data by "Deviation". With library(shiny) I made a slider to define the amount of groups.

> input
[1] 4

So "input" gives the information that mydata has to be grouped by 4 groups. Then I need the range of "Deviation" in order to apply the groups and after that I can calculate the range of each group:

ABWGES <- -min(mydata$Deviation)+max(mydata$Deviation)
STEP <- round(ABWGES/input)
print(STEP)
[1] 24923

I tried a lot of different ways to do the next few steps, but I was unable to get my favored result. If I set the amount of groups to 4, R needs to create 4 groups. I set a name for each group in brackets to show you my result, but I wished it would be possible to name the groups out of the vector that will be generated for the brakes:

(1)-98893 to -73970
(2)-73969 to -49046
(3)-49045 to -24122
(4)-24121 to 802

Deviation   Fuel    group
    -98893        10    1
    -98032        20    1
    -93958        30    1
    -68953        40    2
    -67829        50    2
    -63738        60    2
    -60031        70    2
    -55063        80    2
    -54864        90    2
    -53042       100    2
    -52000       110    2
    -51000       120    2
       500       130    4
       800       140    4

As you can see, there is a group missing, because no number meet the requirement for group 3. So my barplot will only have 3 bars and this is annoying. It should be this instead:

Deviation   Fuel    group
    -98893        10    1
    -98032        20    1
    -93958        30    1
    -68953        40    2
    -67829        50    2
    -63738        60    2
    -60031        70    2
    -55063        80    2
    -54864        90    2
    -53042       100    2
    -52000       110    2
    -51000       120    2
         0         0    3
       500       130    4
       800       140    4

I want to perform this procedure for a lot of data, so does anyone has an idea of how to do it? P.S. I tried to use setDT for cutting the data into groups..


Solution

  • We can use complete from tidyr to get the missing combination

    library(tidyverse)
    df1 %>% 
       complete(group = full_seq(group, 1), fill = list(Deviation = 0, Fuel = 0))  
    # A tibble: 15 x 3
    #   group Deviation  Fuel
    #   <dbl>     <dbl> <dbl>
    # 1     1    -98893    10
    # 2     1    -98032    20
    # 3     1    -93958    30
    # 4     2    -68953    40
    # 5     2    -67829    50
    # 6     2    -63738    60
    # 7     2    -60031    70
    # 8     2    -55063    80
    # 9     2    -54864    90
    #10     2    -53042   100
    #11     2    -52000   110
    #12     2    -51000   120
    #13     3         0     0
    #14     4       500   130
    #15     4       800   140
    

    data

    df1 <- structure(list(Deviation = c(-98893L, -98032L, -93958L, -68953L, 
    -67829L, -63738L, -60031L, -55063L, -54864L, -53042L, -52000L, 
     -51000L, 500L, 800L), Fuel = c(10L, 20L, 30L, 40L, 50L, 60L, 
    70L, 80L, 90L, 100L, 110L, 120L, 130L, 140L), group = c(1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L)), .Names = c("Deviation", 
    "Fuel", "group"), class = "data.frame", row.names = c(NA, -14L
     ))