Search code examples

Speed up/replace the loop for millions data:judge multi date range

Good evening guys,I have 6 millions data and they have four types.

z=structure(list(date = structure(c(11866, 16190, 14729, 11718), class = "Date"), 
           beg1 = structure(c(12264, 12264, 13970, 12264), class = "Date"), 
           end1 = structure(c(17621, 14760, 14760, 13298), class = "Date"), 
           ID1 = c(1003587, 1000396, 1010743, 1002113), beg2 = structure(c(NA, 
                                                                           14790, 14790, 13299), class = "Date"), end2 = structure(c(NA, 
                                                                                                                                     17621, 15217, 13969), class = "Date"), ID2 = c(NA, 1024488, 
                                                                                                                                                                                    1027877, 1002824), beg3 = structure(c(NA, NA, 15218, 13970
                                                                                                                                                                                    ), class = "Date"), end3 = structure(c(NA, NA, 17621, 14760
                                                                                                                                                                                    ), class = "Date"), ID3 = c(NA, NA, 1031361, 1002113), beg4 = structure(c(NA, 
                                                                                                                                                                                                                                                              NA, NA, 14790), class = "Date"), end4 = structure(c(NA, NA, 
                                                                                                                                                                                                                                                                                                                  NA, 17621), class = "Date"), ID4 = c(NA, NA, NA, 1021290), 
           realID = c(NA, NA, NA, NA)), row.names = c(267365L, 193587L, 
                                                      5294385L, 2039421L), class = "data.frame")

and I tried to judge and assign a suitalbe ID based on their date in which date ranges(use the loop).

for(i in 1:nrow(z)){tryCatch({print(i)

The code works. But,now the problem is I have too many datas,the loop is inefficiency,may be it will take almost one day to loop.

Does anyone know how can I improve or replace the code? Thanks you so much.


  • Since R is a vectorized language, to speed up this code it is best to operate on the entire vector as oppose to looping through each element.
    As simple solution is to use a series of ifelse statements.

    z$realID <- ifelse(!$beg1) & z$date> z$beg1 & z$date< z$end1, z$ID1, z$realID)
    z$realID <- ifelse(!$beg2) & z$date> z$beg2 & z$date< z$end2, z$ID2, z$realID)
    z$realID <- ifelse(!$beg3) & z$date> z$beg3 & z$date< z$end3, z$ID3, z$realID)
    z$realID <- ifelse(!$beg4) & z$date> z$beg4 & z$date< z$end4, z$ID4, z$realID)

    When the if statement evaluates TRUE, the realID will update if not it will retain its prior value.