Search code examples
rfor-loopmaxintervals

for loop to determine the top 10 percent of values in an interval


I essentially have two columns (vectors) with speed and accel in a data.frame as such:

    speed     acceleration
1   3.2694444 2.6539535522
2   3.3388889 2.5096979141
3   3.3888889 2.2722134590
4   3.4388889 1.9815256596
5   3.5000000 1.6777544022
6   3.5555556 1.3933215141
7   3.6055556 1.1439051628
8   3.6527778 0.9334115982
9   3.6722222 0.7561602592

I need to find for each value speed on the x axis (speed), what is the top 10% max values from the y axis (acceleration). This also needs to be in a specific interval. For example speed 3.2-3.4, 3.4-3.6, and so on. Can you please show me how a for loop would look like in this situation?


Solution

  • As @alistaire already pointed out, you have provided a very limited amount of data. So we first have to simulate I a bit more data based on which we can test our code.

    set.seed(1)
    
    # your data
    speed <- c(3.2694444, 3.3388889, 3.3388889, 3.4388889, 3.5,
               3.5555556, 3.6055556, 3.6527778, 3.6722222)
    acceleration <- c(2.6539535522, 2.5096979141, 2.2722134590,
                      1.9815256596, 1.6777544022, 1.3933215141,
                      1.1439051628, 0.9334115982, 0.7561602592)
    df <- data.frame(speed, acceleration)
    
    # expand data.frame and add a little bit of noise to all values
    # to make them 'unique'
    df <- as.data.frame(do.call(
      rbind,
      replicate(15L, apply(df, 2, \(x) (x + runif(length(x), -1e-1, 1e-1) )),
                simplify = FALSE)
    ))
    

    The function create_intervals, as the name suggests, creates user-defined intervals. The rest of the code does the 'heavy lifting' and stores the desired result in out.

    If you would like to have intervals of speed with equal widths, simply specify the number of groups (n_groups) you would like to have and leave the rest of the arguments (i.e. lwr, upr, and interval_span) unspecified.

    # Cut speed into user-defined intervals
    create_intervals <- \(n_groups = NULL, lwr = NULL, upr = NULL, interval_span = NULL) {
      if (!is.null(lwr) & !is.null(upr) & !is.null(interval_span) & is.null(n_groups)) {
        speed_low <- subset(df, speed < lwr, select = speed)
        first_interval <- with(speed_low, c(min(speed), lwr))
        middle_intervals <- seq(lwr + interval_span, upr - interval_span, interval_span)
        speed_upp <- subset(df, speed > upr, select = speed)
        last_interval <- with(speed_upp, c(upr, max(speed)))
        intervals <- c(first_interval, middle_intervals, last_interval)
      } else {
        step <- with(df, c(max(speed) - min(speed))/n_groups)
        intervals <- array(0L, dim = n_groups)
        for(i in seq_len(n_groups)) {
          intervals[i] <- min(df$speed) + i * step
        }
      }
      return(intervals)
    }
    
    # three intervals with equal width
    my_intervals <- create_intervals(n_groups = 3L)
    
    # Compute values of speed when acceleration is greater then
    # or equal to the 90th percentile 
    out <- lapply(1:(length(my_intervals)-1L), \(i) {
      x <- subset(df, speed >= my_intervals[i] & speed <= my_intervals[i+1L])
      x[x$acceleration >= quantile(x$acceleration, 0.9), ]
    })
    
    # function to round values to two decimal places
    r <- \(x) format(round(x, 2), nsmall = 2L)
    
    # assign names to each element of out
    for(i in seq_along(out)) {
      names(out)[i] <- paste0(r(my_intervals[i]), '-', r(my_intervals[i+1L]))
    }
    

    Output 1

    > out
    $`3.38-3.57`
           speed acceleration
    11  3.394378     2.583636
    21  3.383631     2.267659
    57  3.434123     2.300234
    83  3.394886     2.580924
    101 3.395459     2.460971
    
    $`3.57-3.76`
          speed acceleration
    6  3.635234     1.447290
    41 3.572868     1.618293
    51 3.615017     1.420020
    95 3.575412     1.763215
    

    We could also compute the desired values of speed based on intervals that make more 'sense' than just equally spaced speed intervals, e.g. [min(speed), 3.3), [3.3, 3.45), [3.45, 3.6), and [3.6, max(speed)).

    This can be accomplished by leaving n_groups unspecified and instead specify lwr, upr, and an interval_span that makes sense. For instance, it makes sense to have a interval span of 0.15 when the lower limit is 3.3 and the upper limit is 3.6.

    # custom boundaries based on a lower limit and upper limit
    my_intervals <- create_intervals(lwr = 3.3, upr = 3.6, interval_span = 0.15)
    

    Output 2

    > out
    $`3.18-3.30`
          speed acceleration
    37 3.238781     2.696456
    82 3.258691     2.722076
    
    $`3.30-3.45`
          speed acceleration
    11 3.394378     2.583636
    19 3.328292     2.711825
    73 3.315306     2.644580
    83 3.394886     2.580924
    
    $`3.45-3.60`
          speed acceleration
    4  3.520530     2.018930
    40 3.517329     2.032943
    58 3.485247     2.079893
    67 3.458031     2.078545
    
    $`3.60-3.76`
          speed acceleration
    6  3.635234     1.447290
    34 3.688131     1.218969
    51 3.615017     1.420020
    78 3.628465     1.348873
    

    Note: use function(x) instead of \(x) if you use a version of R <4.1.0