Search code examples
rdataframemergecbind

Binding two data frames and removing same row numbers in R


I have two data frames of equal number of rows (639) but differing column lengths (DF1: 5, DF2: 2500), the rows in DF1 correspond to the rows in DF2.

Some rows in DF2 will be removed due to several NAs, but I have no information on which ones are removed. cbind() does not allow me to bind the DFs together due to the now differing row lengths. However, I also need the rows to correspond, so if row 47 is removed in DF2, it must also be removed in DF1 upon merge. My assumption is that there can be some workaround with row.names but I am not sure how to execute it. Help would be appreciated. Examples of DFs below:

DF1:

    pp trialNo  item trialTarget trial
1 pp01       1  M012      script     1
2 pp01       2 BS016      script     2
3 pp01       3  M007      script     3
4 pp01       4 BS010      script     4
5 pp01       5  M006      script     5
6 pp01       6 BS018      script     6 

DF2:

    V1  V2  V3  V4  V5  V6
1: 764 764 763 763 762 763
2: 714 714 711 708 705 704
3: 872 871 869 867 867 871
4: 730 728 727 724 722 719
5: 789 786 788 790 792 790
6: 922 923 928 933 938 938

And assuming row 3 in DF2 is removed, I would expect this after binding:

    pp trialNo  item trialTarget trial  V1  V2  V3  V4  V5  V6
1 pp01       1  M012      script     1 764 764 763 763 762 763
2 pp01       2 BS016      script     2 714 714 711 708 705 704
4 pp01       4 BS010      script     4 730 728 727 724 722 719
5 pp01       5  M006      script     5 789 786 788 790 792 790
6 pp01       6 BS018      script     6 922 923 928 933 938 938

Thanks in advance.


Solution

  • You could create a row index in each of the dataframe.

    df1$row <- 1:nrow(df1)
    df2$row <- 1:nrow(df2)
    

    Then remove 3rd row in df2.

    df2 <- df2[-3, ]
    

    You can then merge by row column both the dataframes.

    merge(df1, df2, by = 'row')
    
    #  row   pp trialNo  item trialTarget trial  V1  V2  V3  V4  V5  V6
    #1   1 pp01       1  M012      script     1 764 764 763 763 762 763
    #2   2 pp01       2 BS016      script     2 714 714 711 708 705 704
    #3   4 pp01       4 BS010      script     4 730 728 727 724 722 719
    #4   5 pp01       5  M006      script     5 789 786 788 790 792 790
    #5   6 pp01       6 BS018      script     6 922 923 928 933 938 938
    

    data

    df1 <- structure(list(pp = c("pp01", "pp01", "pp01", "pp01", "pp01", 
    "pp01"), trialNo = 1:6, item = c("M012", "BS016", "M007", "BS010", 
    "M006", "BS018"), trialTarget = c("script", "script", "script", 
    "script", "script", "script"), trial = 1:6, row = 1:6), row.names = c(NA, 
    -6L), class = "data.frame")
    
    df2 <- structure(list(V1 = c(764L, 714L, 872L, 730L, 789L, 922L), V2 = c(764L, 
    714L, 871L, 728L, 786L, 923L), V3 = c(763L, 711L, 869L, 727L, 
    788L, 928L), V4 = c(763L, 708L, 867L, 724L, 790L, 933L), V5 = c(762L, 
    705L, 867L, 722L, 792L, 938L), V6 = c(763L, 704L, 871L, 719L, 
    790L, 938L)), class = "data.frame", row.names = c(NA, -6L))