Search code examples
dplyrmergeaggregate

Combine two data by every 3rd row


I have the following two data frames:

df<- structure(t(data.frame(x=1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10)))
df2<- structure(data.frame(x= rep(30, times=10)))

I am looking to combine the two data frames so that one row of df2 is inserted into every third spot on df. So the final data frame should look like:

df3<- structure(t(data.frame(x=1,1,30,2,2,30,3,3,30,4,4,30,5,5,30,6,6,30,7,7,30,8,8,30,9,9,30,10, 10,30)))

This could be done by making a grouping variable for each data frame. Like:

df$group<- rep(1:10, times=2) %>% arrange(group)

df2$group<- rep(1:10)

And then merging based on the group, but the order of my rows in df is important and cannot be switched around.

Any help would be appreciated


Solution

  • We can do this by joining the two data frames together with a customer sort order.

    Something like the following:

    df<- structure(t(data.frame(x=1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10)))
    df2<- structure(data.frame(x= rep(30, times=10)))
    
    # current row orders
    df = df %>%
      mutate(row_number = 1:nrow(df),
             source = "df")
    df2 = df2 %>%
      mutate(row_number = 1:nrow(df2),
             source = "df2")
    
    # prep customer sorters
    df2 = df2 %>% mutate(row_number = 2 * row_number + 0.5)
    
    # attend and sort
    df3 = rbind(df, df2) %>%
      arrange(row_number)
    
    # remove extra columns
    df3 = df3 %>%
      select(-row_number, -source)
    

    Most of the cleverness happens in the customer sorter: row_number = 2 * row_number + 0.5 of df2.

    • Adding 0.5 ensure that the df2 rows occur between df rows.
    • Multiplying by 2 means there will be two rows from df between each row in df2
    • Getting the current row orders means that row orders for df and df2 are preserved.