I have no idea of how to group data by a column with several conditions. I want to have a flexible slider that sets the amount of groups and R calculated the sum of the rows in order to display them in a barplot. Here is what I did and I hope someone has an idea:
I have a set of data like this:
print(mydata)
Deviation Fuel
-98893 10
-98032 20
-93958 30
-68953 40
-67829 50
etc, etc, etc..
At the end they will be displayed in a barplot. Now I want to group the data by "Deviation". With library(shiny) I made a slider to define the amount of groups.
> input
[1] 4
So "input" gives the information that mydata has to be grouped by 4 groups. Then I need the range of "Deviation" in order to apply the groups and after that I can calculate the range of each group:
ABWGES <- -min(mydata$Deviation)+max(mydata$Deviation)
STEP <- round(ABWGES/input)
print(STEP)
[1] 24923
I tried a lot of different ways to do the next few steps, but I was unable to get my favored result. If I set the amount of groups to 4, R needs to create 4 groups. I set a name for each group in brackets to show you my result, but I wished it would be possible to name the groups out of the vector that will be generated for the brakes:
(1)-98893 to -73970
(2)-73969 to -49046
(3)-49045 to -24122
(4)-24121 to 802
Deviation Fuel group
-98893 10 1
-98032 20 1
-93958 30 1
-68953 40 2
-67829 50 2
-63738 60 2
-60031 70 2
-55063 80 2
-54864 90 2
-53042 100 2
-52000 110 2
-51000 120 2
500 130 4
800 140 4
As you can see, there is a group missing, because no number meet the requirement for group 3. So my barplot will only have 3 bars and this is annoying. It should be this instead:
Deviation Fuel group
-98893 10 1
-98032 20 1
-93958 30 1
-68953 40 2
-67829 50 2
-63738 60 2
-60031 70 2
-55063 80 2
-54864 90 2
-53042 100 2
-52000 110 2
-51000 120 2
0 0 3
500 130 4
800 140 4
I want to perform this procedure for a lot of data, so does anyone has an idea of how to do it? P.S. I tried to use setDT for cutting the data into groups..
We can use complete
from tidyr
to get the missing combination
library(tidyverse)
df1 %>%
complete(group = full_seq(group, 1), fill = list(Deviation = 0, Fuel = 0))
# A tibble: 15 x 3
# group Deviation Fuel
# <dbl> <dbl> <dbl>
# 1 1 -98893 10
# 2 1 -98032 20
# 3 1 -93958 30
# 4 2 -68953 40
# 5 2 -67829 50
# 6 2 -63738 60
# 7 2 -60031 70
# 8 2 -55063 80
# 9 2 -54864 90
#10 2 -53042 100
#11 2 -52000 110
#12 2 -51000 120
#13 3 0 0
#14 4 500 130
#15 4 800 140
df1 <- structure(list(Deviation = c(-98893L, -98032L, -93958L, -68953L,
-67829L, -63738L, -60031L, -55063L, -54864L, -53042L, -52000L,
-51000L, 500L, 800L), Fuel = c(10L, 20L, 30L, 40L, 50L, 60L,
70L, 80L, 90L, 100L, 110L, 120L, 130L, 140L), group = c(1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L)), .Names = c("Deviation",
"Fuel", "group"), class = "data.frame", row.names = c(NA, -14L
))