Was hoping someone can assist.
I am trying to add another column: 6th Worst
. What I want to do is for it to produce the 6th worst y
result based on a specified criteria: Date
.
Here is an example of my df:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
So for 1/10/2018 the 3. Hence, the Data set would look like this:
Key Date y x1 x2 x3 6th worst
1 1/10/2018 12:00:00 AM 2 3 2 5 3
1 1/11/2018 12:00:00 AM 3 5 7 2 ... (would have values)
1 1/12/2018 12:00:00 AM 5 7 4 7 ... (would have values)
1 1/13/2018 12:00:00 AM 7 2 7 6 ... (would have values)
2 1/10/2018 12:00:00 AM 2 6 3 8 3
2 1/11/2018 12:00:00 AM 3 7 7 3 etc.
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
3 1/10/2018 12:00:00 AM 2 3 2 5
3 1/11/2018 12:00:00 AM 3 5 7 2
3 1/12/2018 12:00:00 AM 5 7 4 7
3 1/13/2018 12:00:00 AM 7 2 7 6
3 1/10/2018 12:00:00 AM 2 6 3 8
3 1/11/2018 12:00:00 AM 3 7 7 3
3 1/12/2018 12:00:00 AM 3 2 3 4
3 1/13/2018 12:00:00 AM 7 6 2 7
4 1/10/2018 12:00:00 AM 2 3 2 5
4 1/11/2018 12:00:00 AM 3 5 7 2
4 1/12/2018 12:00:00 AM 5 7 4 7
4 1/13/2018 12:00:00 AM 7 2 7 6
4 1/10/2018 12:00:00 AM 2 6 3 8
4 1/11/2018 12:00:00 AM 3 7 7 3
5 1/12/2018 12:00:00 AM 3 2 3 4
5 1/13/2018 12:00:00 AM 7 6 2 7
5 1/10/2018 12:00:00 AM 2 3 2 5
5 1/11/2018 12:00:00 AM 3 5 7 2
5 1/12/2018 12:00:00 AM 5 7 4 7
5 1/13/2018 12:00:00 AM 7 2 7 6
6 1/10/2018 12:00:00 AM 2 6 3 8
6 1/11/2018 12:00:00 AM 3 7 7 3
6 1/12/2018 12:00:00 AM 3 2 3 4
6 1/13/2018 12:00:00 AM 7 6 2 7
Here is what I have so far:
#to obtain the 6th worst value from the data set
n=length(df$y)
df$`6th Worst`= df$`6th Worst`= "-"
df[1,3] = round(-sort(subset(df,c(unique(Date), "y")), partial=n-5)[n-5], digits = 2)
I get the following error:
Error in subset.data.frame(reg_predict, unique(reg_predict2$Date)) :
'subset' must be logical
Edit: question differs from the duplicate flagged question in several respects. Particularly in the fact that I need a conditional 6th worst scenario and not just a worst/best scenario.
An option with dplyr
and sort
can be as:
Note: One can convert the Date
column to POSIXct
format before grouping but I didn't notice any advantage as such.
library(dplyr)
df %>% group_by(Date) %>%
mutate(Worst6th = sort(y)[6])
# A tibble: 32 x 7
# Groups: Date [4]
Key Date y x1 x2 x3 Worst6th
<int> <chr> <int> <int> <int> <int> <int>
1 1 1/10/2018 12:00:00 AM 2 3 2 5 2
2 1 1/11/2018 12:00:00 AM 3 5 7 2 3
3 1 1/12/2018 12:00:00 AM 5 7 4 7 5
4 1 1/13/2018 12:00:00 AM 7 2 7 6 7
5 2 1/10/2018 12:00:00 AM 2 6 3 8 2
6 2 1/11/2018 12:00:00 AM 3 7 7 3 3
7 2 1/12/2018 12:00:00 AM 3 2 3 4 5
8 2 1/13/2018 12:00:00 AM 7 6 2 7 7
9 3 1/10/2018 12:00:00 AM 2 3 2 5 2
10 3 1/11/2018 12:00:00 AM 3 5 7 2 3
# ... with 22 more rows
Data:
df <- read.table(text="
Key Date y x1 x2 x3
1 '1/10/2018 12:00:00 AM' 2 3 2 5
1 '1/11/2018 12:00:00 AM' 3 5 7 2
1 '1/12/2018 12:00:00 AM' 5 7 4 7
1 '1/13/2018 12:00:00 AM' 7 2 7 6
2 '1/10/2018 12:00:00 AM' 2 6 3 8
2 '1/11/2018 12:00:00 AM' 3 7 7 3
2 '1/12/2018 12:00:00 AM' 3 2 3 4
2 '1/13/2018 12:00:00 AM' 7 6 2 7
3 '1/10/2018 12:00:00 AM' 2 3 2 5
3 '1/11/2018 12:00:00 AM' 3 5 7 2
3 '1/12/2018 12:00:00 AM' 5 7 4 7
3 '1/13/2018 12:00:00 AM' 7 2 7 6
3 '1/10/2018 12:00:00 AM' 2 6 3 8
3 '1/11/2018 12:00:00 AM' 3 7 7 3
3 '1/12/2018 12:00:00 AM' 3 2 3 4
3 '1/13/2018 12:00:00 AM' 7 6 2 7
4 '1/10/2018 12:00:00 AM' 2 3 2 5
4 '1/11/2018 12:00:00 AM' 3 5 7 2
4 '1/12/2018 12:00:00 AM' 5 7 4 7
4 '1/13/2018 12:00:00 AM' 7 2 7 6
4 '1/10/2018 12:00:00 AM' 2 6 3 8
4 '1/11/2018 12:00:00 AM' 3 7 7 3
5 '1/12/2018 12:00:00 AM' 3 2 3 4
5 '1/13/2018 12:00:00 AM' 7 6 2 7
5 '1/10/2018 12:00:00 AM' 2 3 2 5
5 '1/11/2018 12:00:00 AM' 3 5 7 2
5 '1/12/2018 12:00:00 AM' 5 7 4 7
5 '1/13/2018 12:00:00 AM' 7 2 7 6
6 '1/10/2018 12:00:00 AM' 2 6 3 8
6 '1/11/2018 12:00:00 AM' 3 7 7 3
6 '1/12/2018 12:00:00 AM' 3 2 3 4
6 '1/13/2018 12:00:00 AM' 7 6 2 7",
header = TRUE, stringsAsFactors = FALSE)