Search code examples
rdataframerepeat

A way to use rep() with data.frames or vectors?


I have a list of data frames that looks likes this:

dflist:
[[12]]
label         site
   <chr>        <int>
 [1] NODE_0000138    12
 [2] NODE_0000222    12
 [3] NODE_0000205    12
 [4] NODE_0000241    12
 [5] 061D03KR01      12
 [6] 061D03KR03      12

[[15]]
label         site
  <chr>        <int>
[1] NODE_0000203    15
[2] 061D03OR17      15
[3] 061D03OR19      15
[4] 061D03UR28      15

[[18]]
label         site
   <chr>        <int>
 [1] NODE_0000181    18
 [2] NODE_0000226    18
 [3] 061D03KR11      18
 [4] 061D03OR02      18
 [5] 061D03OR32      18`

I also have a data.frame with information:

df
  from to
1   12 18
2   12 35
3   15 18

I have been trying ways to create a new data.frame so that values are repeated according to the df rows.

The df 1st row is

from to
1   12 18

So, I would like to obtain all possible combinations from the listdf with 12 and 18:

NODE_0000138    12    NODE_0000181    18
NODE_0000138    12    NODE_0000226    18
NODE_0000138    12    061D03KR11      18
NODE_0000138    12    061D03OR02      18
NODE_0000138    12    061D03OR32      18

NODE_0000222    12    NODE_0000181    18
NODE_0000222    12    NODE_0000226    18
NODE_0000222    12    061D03KR11      18
NODE_0000222    12    061D03OR02      18
NODE_0000222    12    061D03OR32      18

NODE_0000205    12    NODE_0000181    18
NODE_0000205    12    NODE_0000226    18
NODE_0000205    12    061D03KR11      18
NODE_0000205    12    061D03OR02      18
NODE_0000205    12    061D03OR32      18`

... ...

And so on, for all of the df rows. I though about a for loop, but it gets messy. Any other ideas??

Thanks!!!

I have also been playing with rep, rep_along, but I can't get it to repeat the amount of times I need it to.

I have also tried c(rep(data.frame(c(dflist[[1]][1],dflist[[1]][2])), nrow(dflist[[1]]))) just to see if I could get it repeated, but something is missing.


Solution

  • This sounds like it could be a join from a flat data frame.

    Given: (shorter than your example to show full output)

    dflist <- list(data.frame(label = letters[1:3], site = 12),
         data.frame(label = letters[7:10], site = 15),
         data.frame(label = letters[11:13], site = 18))
    
    df <- data.frame(from = c(12, 15),
                     to = c(18, 18))
    

    We could make a flat version of the list:

    library(dplyr)
    dfs <- dflist |>
      bind_rows() 
    

    And then add the table once based on from and again from to:

    df |>
      left_join(dfs, join_by(from == site)) |>
      left_join(dfs, join_by(to == site))
    

    The output shows a row for every combination of 12-18 and then every combination of 15-18. In this case, the default output of label.x are the labels corresponding to from, and label.y are the labels corresponding to to. Note that we have 9 rows for 12-18, since in my example there are 3x3 combinations, and we have 12 rows for 15-18, since there are 4x3 combinations.

       from to label.x label.y
    1    12 18       a       k
    2    12 18       a       l
    3    12 18       a       m
    4    12 18       b       k
    5    12 18       b       l
    6    12 18       b       m
    7    12 18       c       k
    8    12 18       c       l
    9    12 18       c       m
    10   15 18       g       k
    11   15 18       g       l
    12   15 18       g       m
    13   15 18       h       k
    14   15 18       h       l
    15   15 18       h       m
    16   15 18       i       k
    17   15 18       i       l
    18   15 18       i       m
    19   15 18       j       k
    20   15 18       j       l
    21   15 18       j       m