Search code examples
rsortingfilterrstudiocriteria

How to produce 6th worst value based on 2 criteria and insert results into a separate column?


Was hoping someone can assist.

I am trying to add another column: 6th Worst. What I want to do is for it to produce the 6th worst y result based on a specified criteria: Date.

Here is an example of my df:

Key     Date                     y   x1   x2   x3
   1    1/10/2018 12:00:00 AM    2   3    2    5
   1    1/11/2018 12:00:00 AM    3   5    7    2
   1    1/12/2018 12:00:00 AM    5   7    4    7 
   1    1/13/2018 12:00:00 AM    7   2    7    6
   2    1/10/2018 12:00:00 AM    2   6    3    8
   2    1/11/2018 12:00:00 AM    3   7    7    3
   2    1/12/2018 12:00:00 AM    3   2    3    4
   2    1/13/2018 12:00:00 AM    7   6    2    7
   3    1/10/2018 12:00:00 AM    2   3    2    5
   3    1/11/2018 12:00:00 AM    3   5    7    2
   3    1/12/2018 12:00:00 AM    5   7    4    7 
   3    1/13/2018 12:00:00 AM    7   2    7    6
   3    1/10/2018 12:00:00 AM    2   6    3    8
   3    1/11/2018 12:00:00 AM    3   7    7    3
   3    1/12/2018 12:00:00 AM    3   2    3    4
   3    1/13/2018 12:00:00 AM    7   6    2    7
   4    1/10/2018 12:00:00 AM    2   3    2    5
   4    1/11/2018 12:00:00 AM    3   5    7    2
   4    1/12/2018 12:00:00 AM    5   7    4    7 
   4    1/13/2018 12:00:00 AM    7   2    7    6
   4    1/10/2018 12:00:00 AM    2   6    3    8
   4    1/11/2018 12:00:00 AM    3   7    7    3
   5    1/12/2018 12:00:00 AM    3   2    3    4
   5    1/13/2018 12:00:00 AM    7   6    2    7
   5    1/10/2018 12:00:00 AM    2   3    2    5
   5    1/11/2018 12:00:00 AM    3   5    7    2
   5    1/12/2018 12:00:00 AM    5   7    4    7 
   5    1/13/2018 12:00:00 AM    7   2    7    6
   6    1/10/2018 12:00:00 AM    2   6    3    8
   6    1/11/2018 12:00:00 AM    3   7    7    3
   6    1/12/2018 12:00:00 AM    3   2    3    4
   6    1/13/2018 12:00:00 AM    7   6    2    7

So for 1/10/2018 the 3. Hence, the Data set would look like this:

 Key        Date                     y   x1   x2   x3 6th worst   
       1    1/10/2018 12:00:00 AM    2   3    2    5  3
       1    1/11/2018 12:00:00 AM    3   5    7    2  ... (would have values)
       1    1/12/2018 12:00:00 AM    5   7    4    7  ... (would have values)
       1    1/13/2018 12:00:00 AM    7   2    7    6  ... (would have values)
       2    1/10/2018 12:00:00 AM    2   6    3    8  3
       2    1/11/2018 12:00:00 AM    3   7    7    3  etc.
       2    1/12/2018 12:00:00 AM    3   2    3    4
       2    1/13/2018 12:00:00 AM    7   6    2    7
       3    1/10/2018 12:00:00 AM    2   3    2    5
       3    1/11/2018 12:00:00 AM    3   5    7    2
       3    1/12/2018 12:00:00 AM    5   7    4    7 
       3    1/13/2018 12:00:00 AM    7   2    7    6
       3    1/10/2018 12:00:00 AM    2   6    3    8
       3    1/11/2018 12:00:00 AM    3   7    7    3
       3    1/12/2018 12:00:00 AM    3   2    3    4
       3    1/13/2018 12:00:00 AM    7   6    2    7
       4    1/10/2018 12:00:00 AM    2   3    2    5
       4    1/11/2018 12:00:00 AM    3   5    7    2
       4    1/12/2018 12:00:00 AM    5   7    4    7 
       4    1/13/2018 12:00:00 AM    7   2    7    6
       4    1/10/2018 12:00:00 AM    2   6    3    8
       4    1/11/2018 12:00:00 AM    3   7    7    3
       5    1/12/2018 12:00:00 AM    3   2    3    4
       5    1/13/2018 12:00:00 AM    7   6    2    7
       5    1/10/2018 12:00:00 AM    2   3    2    5
       5    1/11/2018 12:00:00 AM    3   5    7    2
       5    1/12/2018 12:00:00 AM    5   7    4    7 
       5    1/13/2018 12:00:00 AM    7   2    7    6
       6    1/10/2018 12:00:00 AM    2   6    3    8
       6    1/11/2018 12:00:00 AM    3   7    7    3
       6    1/12/2018 12:00:00 AM    3   2    3    4
       6    1/13/2018 12:00:00 AM    7   6    2    7

Here is what I have so far:

#to obtain the 6th worst value from the data set

n=length(df$y)

df$`6th Worst`= df$`6th Worst`= "-"

df[1,3] = round(-sort(subset(df,c(unique(Date), "y")), partial=n-5)[n-5], digits = 2)

I get the following error:

    Error in subset.data.frame(reg_predict, unique(reg_predict2$Date)) : 
  'subset' must be logical

Edit: question differs from the duplicate flagged question in several respects. Particularly in the fact that I need a conditional 6th worst scenario and not just a worst/best scenario.


Solution

  • An option with dplyr and sort can be as:

    Note: One can convert the Date column to POSIXct format before grouping but I didn't notice any advantage as such.

    library(dplyr)
    
    df %>% group_by(Date) %>% 
      mutate(Worst6th = sort(y)[6])
    
    # A tibble: 32 x 7
    # Groups: Date [4]
        Key Date                      y    x1    x2    x3 Worst6th
      <int> <chr>                 <int> <int> <int> <int>    <int>
    1     1 1/10/2018 12:00:00 AM     2     3     2     5        2
    2     1 1/11/2018 12:00:00 AM     3     5     7     2        3
    3     1 1/12/2018 12:00:00 AM     5     7     4     7        5
    4     1 1/13/2018 12:00:00 AM     7     2     7     6        7
    5     2 1/10/2018 12:00:00 AM     2     6     3     8        2
    6     2 1/11/2018 12:00:00 AM     3     7     7     3        3
    7     2 1/12/2018 12:00:00 AM     3     2     3     4        5
    8     2 1/13/2018 12:00:00 AM     7     6     2     7        7
    9     3 1/10/2018 12:00:00 AM     2     3     2     5        2
    10     3 1/11/2018 12:00:00 AM     3     5     7     2        3
    # ... with 22 more rows      
    

    Data:

    df <- read.table(text="
    Key     Date                     y   x1   x2   x3
    1    '1/10/2018 12:00:00 AM'    2   3    2    5
    1    '1/11/2018 12:00:00 AM'    3   5    7    2
    1    '1/12/2018 12:00:00 AM'    5   7    4    7 
    1    '1/13/2018 12:00:00 AM'    7   2    7    6
    2    '1/10/2018 12:00:00 AM'    2   6    3    8
    2    '1/11/2018 12:00:00 AM'    3   7    7    3
    2    '1/12/2018 12:00:00 AM'    3   2    3    4
    2    '1/13/2018 12:00:00 AM'    7   6    2    7
    3    '1/10/2018 12:00:00 AM'    2   3    2    5
    3    '1/11/2018 12:00:00 AM'    3   5    7    2
    3    '1/12/2018 12:00:00 AM'    5   7    4    7 
    3    '1/13/2018 12:00:00 AM'    7   2    7    6
    3    '1/10/2018 12:00:00 AM'    2   6    3    8
    3    '1/11/2018 12:00:00 AM'    3   7    7    3
    3    '1/12/2018 12:00:00 AM'    3   2    3    4
    3    '1/13/2018 12:00:00 AM'    7   6    2    7
    4    '1/10/2018 12:00:00 AM'    2   3    2    5
    4    '1/11/2018 12:00:00 AM'    3   5    7    2
    4    '1/12/2018 12:00:00 AM'    5   7    4    7 
    4    '1/13/2018 12:00:00 AM'    7   2    7    6
    4    '1/10/2018 12:00:00 AM'    2   6    3    8
    4    '1/11/2018 12:00:00 AM'    3   7    7    3
    5    '1/12/2018 12:00:00 AM'    3   2    3    4
    5    '1/13/2018 12:00:00 AM'    7   6    2    7
    5    '1/10/2018 12:00:00 AM'    2   3    2    5
    5    '1/11/2018 12:00:00 AM'    3   5    7    2
    5    '1/12/2018 12:00:00 AM'    5   7    4    7 
    5    '1/13/2018 12:00:00 AM'    7   2    7    6
    6    '1/10/2018 12:00:00 AM'    2   6    3    8
    6    '1/11/2018 12:00:00 AM'    3   7    7    3
    6    '1/12/2018 12:00:00 AM'    3   2    3    4
    6    '1/13/2018 12:00:00 AM'    7   6    2    7",
    header = TRUE, stringsAsFactors = FALSE)