Search code examples
rfor-loopvectorizationnested-loopsapply

How to match Row and Row +1 Using apply R


I am attempting to replace an inefficient nested for loop that will not run on a large dataset with the apply function.

    unique <- cbind.data.frame(c(1,2,3))
    colnames(unique) <- "note"

    ptSeensub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))
    colnames(ptSeenSub) <- c("PARENT_EVENT_ID", "USER_NAME")

    uniqueRow <- nrow(unique)
    ptSeenSubRow <- nrow(ptSeenSubRow)

    for (note in 1:uniqueRow)
    {
       for (row in 1:ptSeenSubRow)
       {
         if (ptSeenSub$PARENT_EVENT_ID[row] == unique$note[note])
         {
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row]
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row +1]
         } 
       }
     }

I would like the results to be similar to this dataframe:

results <- rbind.data.frame(c(1, "a", "b"), c(2, "a", "d"), c(3,"e", "f"))
colnames(results) <- c("note", "attending_name", "resident_name")

The loop will be running over millions of rows and will not finish. How can I vectorize this to finish over large data sets? Any advice is greatly apprecaited


Solution

  • Sounds like you are trying to reshape data into wide format. I find that dplyr and tidyr find nice tools to accomplish this.

    define data

    library(tidyr)
    library(dplyr)
    ptSeenSub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))
    

    reshape

    result <- ptSeenSub %>%
      group_by(PARENT_EVENT_ID) %>%
      mutate(k = row_number()) %>%
      spread(k, USER_NAME)
    

    You can then change names if you wish:

    names(result) <- c("notes", "attending_name", "resident_name")