r for-loop vectorization nested-loops apply

How to match Row and Row +1 Using apply R

I am attempting to replace an inefficient nested for loop that will not run on a large dataset with the apply function.

    unique <- cbind.data.frame(c(1,2,3))
    colnames(unique) <- "note"

    ptSeensub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))
    colnames(ptSeenSub) <- c("PARENT_EVENT_ID", "USER_NAME")

    uniqueRow <- nrow(unique)
    ptSeenSubRow <- nrow(ptSeenSubRow)

    for (note in 1:uniqueRow)
    {
       for (row in 1:ptSeenSubRow)
       {
         if (ptSeenSub$PARENT_EVENT_ID[row] == unique$note[note])
         {
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row]
           unique$attending_name[note] <- ptSeenSub$USER_NAME[row +1]
         } 
       }
     }

I would like the results to be similar to this dataframe:

results <- rbind.data.frame(c(1, "a", "b"), c(2, "a", "d"), c(3,"e", "f"))
colnames(results) <- c("note", "attending_name", "resident_name")

The loop will be running over millions of rows and will not finish. How can I vectorize this to finish over large data sets? Any advice is greatly apprecaited

Solution

Sounds like you are trying to reshape data into wide format. I find that dplyr and tidyr find nice tools to accomplish this.

define data

library(tidyr)
library(dplyr)
ptSeenSub <- rbind.data.frame(c(1,"a"), c(1,"b"), c(2,"a"), c(2,"d"), c(3,"e"), c(3,"f"))

reshape

result <- ptSeenSub %>%
  group_by(PARENT_EVENT_ID) %>%
  mutate(k = row_number()) %>%
  spread(k, USER_NAME)

You can then change names if you wish:

names(result) <- c("notes", "attending_name", "resident_name")