Search code examples
rdplyrsubset

Match two data based on the row name


I have two data frames that I want to match the row name with,

the first consist of

"Pop1_lepro001" "Pop1_lepro002" "Pop1_lepro004" "Pop1_lepro005" "Pop1_lepro006"
"Pop1_lepro007" "Pop1_lepro008" "Pop1_lepro009" "Pop1_lepro010" "Pop1_lepro011"
"Pop1_lepro012" "Pop1_lepro013" "Pop1_lepro014" "Pop1_lepro015" "Pop1_lepro016"...

it has 258 data

and second

"Pop1_lepro001" "Pop1_lepro002" "Pop1_lepro004" "Pop1_lepro005" "Pop1_lepro006"
"Pop1_lepro007" "Pop1_lepro008" "Pop1_lepro009" "Pop1_lepro010" "Pop1_lepro011"
"Pop1_lepro012" "Pop1_lepro013" "Pop1_lepro014" "Pop1_lepro015" "Pop1_lepro016"...

it has 268 data

I would like to exclude the unused line in 2nd data based on the first data,

I am currently beginning of this and need idea to solve this,

kindly appreciate your help.

Thank


Solution

  • This is a two-step: first is to bring the row names into the frame itself, which enables the second step: merge/join the data (see How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?) for much more information on the merge/join mechanics).

    Demonstration using subsets of mtcars:

    base R

    df1$rowname <- rownames(df1)
    df2$rowname <- rownames(df2)
    merge(df1, df2, by = "rowname", all = TRUE)
    #             rowname  mpg cyl disp  hp drat    wt  qsec vs
    # 1        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1
    # 2        Duster 360   NA  NA   NA  NA 3.21 3.570 15.84  0
    # 3    Hornet 4 Drive 21.4   6  258 110   NA    NA    NA NA
    # 4 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0
    # 5         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0
    # 6     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0
    # 7         Merc 240D   NA  NA   NA  NA 3.69 3.190 20.00  1
    # 8           Valiant 18.1   6  225 105 2.76 3.460 20.22  1
    

    dplyr

    (Starting with fresh versions of df1 and df2, no $rowname column.)

    library(dplyr)
    full_join(
      rownames_to_column(df1), 
      rownames_to_column(df2),
      by = "rowname")
    #             rowname  mpg cyl disp  hp drat    wt  qsec vs
    # 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0
    # 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0
    # 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1
    # 4    Hornet 4 Drive 21.4   6  258 110   NA    NA    NA NA
    # 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0
    # 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1
    # 7        Duster 360   NA  NA   NA  NA 3.21 3.570 15.84  0
    # 8         Merc 240D   NA  NA   NA  NA 3.69 3.190 20.00  1
    

    Sample data:

    df1 <- mtcars[1:6, 1:4]
    df2 <- mtcars[c(1:3, 5:8), 5:8]
    df1
    #                    mpg cyl disp  hp
    # Mazda RX4         21.0   6  160 110
    # Mazda RX4 Wag     21.0   6  160 110
    # Datsun 710        22.8   4  108  93
    # Hornet 4 Drive    21.4   6  258 110
    # Hornet Sportabout 18.7   8  360 175
    # Valiant           18.1   6  225 105
    df2
    #                   drat    wt  qsec vs
    # Mazda RX4         3.90 2.620 16.46  0
    # Mazda RX4 Wag     3.90 2.875 17.02  0
    # Datsun 710        3.85 2.320 18.61  1
    # Hornet Sportabout 3.15 3.440 17.02  0
    # Valiant           2.76 3.460 20.22  1
    # Duster 360        3.21 3.570 15.84  0
    # Merc 240D         3.69 3.190 20.00  1