Search code examples
rmergecbind

Binding dataframes of different length (no cbind, no merge)


I am trying to display multiple dataframes next to each other to compare certain entries. However, they have a different number of rows and I want each data frame to be in the exact same order. I tried to use cbind which did not work because of the different number of rows. I used merge to bind two dfs together and then merge them again, however they change order when I do that and it seems inefficient to merge two dfs when I have more than 5 in total.

Examp:

df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df 
  v x
1 1 E
2 2 B
3 3 D
4 4 C
5 5 A

df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
df2
   m n
1  7 G
2  8 I
3  9 F
4 10 H

Then I ordered df2

df2 <- df2[order(df2$m, decreasing = TRUE),]
df2
   m n
4 10 F
3  9 I
2  8 H
1  7 G

Expected output:

  v x m n
1 1 E 10 F
2 2 B 9 I
3 3 D 8 H
4 4 C 7 G
5 5 A NA NA

As I said, I have more than two dfs and the order of the dfs should be remained. Any help will be greatly appreciated!


Solution

  • Edit: In case there are multiple df. Do this

    • Create a list of all dfs except one say first one
    • use purrr::reduce to join all these together
    • pass first df in .init argument.
    df2 <- data.frame(m=7:10, n=sample(LETTERS[6:9],4))
    df <-  data.frame(v=1:5, x=sample(LETTERS[1:5],5))
    df3 <- data.frame(bb = 101:110, cc = sample(letters, 10))
    
    
    reduce(list(df2, df3), .init = df %>% mutate(id = row_number()) , ~full_join(.x, .y %>% mutate(id = row_number()), by = "id" )) %>%
      select(-id)
    
        v    x  m    n  bb cc
    1   1    A 10    I 101  u
    2   2    C  9    H 102  v
    3   3    D  8    G 103  n
    4   4    E  7    F 104  w
    5   5    B NA <NA> 105  s
    6  NA <NA> NA <NA> 106  y
    7  NA <NA> NA <NA> 107  g
    8  NA <NA> NA <NA> 108  i
    9  NA <NA> NA <NA> 109  p
    10 NA <NA> NA <NA> 110  h
    

    Earlier Answer: Create a dummy column id in both dfs and use full_join

    full_join(df %>% mutate(id = row_number()), df2 %>% mutate(id = row_number()), by = "id") %>%
      select(-id)
    
      v x  m    n
    1 1 A 10    I
    2 2 C  9    H
    3 3 D  8    G
    4 4 E  7    F
    5 5 B NA <NA>
    

    Results are different from as expected becuase of different random number seed


    Or in BaseR

    merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)
    
      id v x  m    n
    1  1 1 A 10    I
    2  2 2 C  9    H
    3  3 3 D  8    G
    4  4 4 E  7    F
    5  5 5 B NA <NA>
    

    Remove extra column simply by subsetting []

    merge(transform(df, id = seq_len(nrow(df))), transform(df2, id = seq_len(nrow(df2))), all = T)[-1]
    
      v x  m    n
    1 1 A 10    I
    2 2 C  9    H
    3 3 D  8    G
    4 4 E  7    F
    5 5 B NA <NA>