Search code examples
rfor-loopwhile-loopbreak

For/ while loops until sequence is maintained


I have a data frame that looks like the following:

input <- structure(list(rank = c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 
7L, 7L, 8L, 8L, 9L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 14L, 
14L, 15L, 16L, 17L, 18L, 19L), sequence = c("HRIGRGGRYGRKGVAI", 
"TQIDELPVDFAAYLGE", "AFSIGLLQRLDFRHNL", "QNDKIAPQDQDSFLDD", "SMHAEMPKSDRERVMN", 
"AQSVIFANTRRKVDWI", "PGRVSDVIKRGALRTE", "AEVISRIGEFLSNSSK", "GGDIIAQAQSGTGKTG", 
"TFVGGTRVQDDLRKLQ", "QGLVLSPTRELALQTA", "DWIAEKLNQSNHTVSS", "NIVINFDLPTNKENYL", 
"AGVIVAVGTPGRVSDV", "SDRERVMNTFRSGSSR", "GFEKPSSIQQRAIAPF", "SGTGKTGAFSIGLLQR", 
"LDTLMDLYETVSIAQS", "VRPIPSFDDMPLHQNL", "MPEEVLELTKKFMRDP", "QQRAIAPFTRGGDIIA", 
"LHEIEAHYHTQIDELP", "LVARGIDVHHVNIVIN", "ANTRRKVDWIAEKLNQ", "VLVLDEADEMLSQGFA", 
"RGALRTESLRVLVLDE", "PQDQDSFLDDQPGVRP", "YGRKGVAINFVTEKDV", "SSKFCETFVGGTRVQD", 
"RVLVTTDLVARGIDVH"), start_position = c(353L, 388L, 79L, 3L, 
296L, 268L, 155L, 111L, 63L, 130L, 96L, 281L, 337L, 146L, 304L, 
45L, 72L, 255L, 22L, 212L, 53L, 379L, 326L, 274L, 174L, 164L, 
9L, 361L, 124L, 319L), score = c(0.92, 0.89, 0.87, 0.87, 0.86, 
0.86, 0.85, 0.85, 0.84, 0.84, 0.79, 0.79, 0.78, 0.78, 0.77, 0.76, 
0.75, 0.75, 0.75, 0.75, 0.74, 0.74, 0.73, 0.72, 0.72, 0.71, 0.68, 
0.67, 0.65, 0.63)), .Names = c("rank", "sequence", "start_position", 
"score"), row.names = c(NA, -30L), class = c("tbl_df", "tbl", 
"data.frame"))

What I want to do is the following. Looking at input$rank, I want to add up the scores under input$score until the sequence under input$rank is held.

As an example, considering the first sequence, which goes from rows 1:36 (the 37th value under input$rankis a 1 - note that there are repeated values under input$rank), I would have a sum of 26.76 - I obtained this by doing sum(input$score[1:36]).

I thought about inserting break or next within a foror while loop, although I am not that familiar with those arguments within a loop.


Solution

  • Hopefully this is closer to what you're looking for. What I did was test a differenced version of the rank vector for values less than zero and do a cumulative sum on that. The result is then used as the grouping vector in a call to aggregate().

    set.seed(1)
    rank <- c(1, 2, 3, 5, 5, 1, 2, 2, 3, 1, 2, 4, 4, 5)
    score <- round(runif(length(rank)), 2)
    input <- data.frame(rank, score)
    input <- cbind(group=cumsum(c(-1, diff(input$rank)) < 0), input)
    input
    #    group rank score
    # 1      1    1  0.27
    # 2      1    2  0.37
    # 3      1    3  0.57
    # 4      1    5  0.91
    # 5      1    5  0.20
    # 6      2    1  0.90
    # 7      2    2  0.94
    # 8      2    2  0.66
    # 9      2    3  0.63
    # 10     3    1  0.06
    # 11     3    2  0.21
    # 12     3    4  0.18
    # 13     3    4  0.69
    # 14     3    5  0.38
    
    aggregate(score ~ group, data=input, sum)
    #   group score
    # 1     1  2.32
    # 2     2  3.13
    # 3     3  1.52