Search code examples
rstring-matchingdata-wrangling

Manipulating columns of a dataframe based on a name match in a separate dataframe


I am looking to find a way to use a data frame that has two columns (the first one being the name, and the other being numerical values), and if a name matches with that of a column below (DF1), then to keep that column.

For example, I have a data frame called List1:

Group Number
group1 0.5
group3 0.6
group4 0.8
group5 0.9

I want to compare the strings in the first column of List1, and if one of the entries in that column matches with a column name in DF1 (exception is I want to keep the first two columns of DF1 regardless), and if there is a name match between List1$Group and a column name, then to keep it in DF1.

I have this DF1:

Name Key group1 group2 group3
XAS /// HUA test1234 10 10 8
MPA1 /// AAS2 test4553 8 7 4
MPAS test3341 5 5 5
SSPA1 test2142 5 6 8
MAS61A test4722 6 7 4

So the goal is to get this new data frame after applying the code (group 2 is dropped because it is not in List1):

Name Key group1 group3
XAS /// HUA test1234 10 8
MPA1 /// AAS2 test4553 8 4
MPAS test3341 5 5
SSPA1 test2142 5 8
MAS61A test4722 6 4

How can this be done for large data sets? Any thoughts are greatly appreciated!

# DF1
Name <- c("XAS /// HUA", "MPA1 /// AAS2", "MPAS", "SSPA1", "MAS61A")
Key <- c("test1234", "test4553", "test3341", "test2142", "test4722")
group1 <- c(10, 8, 5, 5, 6)
group2 <- c(10, 7, 5, 6, 7)
group3 <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Key, group1, group2, group3)

Group <- c("group1", "group3", "group4", "group5")
Number <- c(0.5, 0.6, 0.8, 0.9)
List1 <- data.frame(Group, Number)

Solution

  • You can use the %in% operator to compare two sets of strings.

    Solution 1

    Since you want to keep the first two columns, we need to set the logical values to TRUE (i.e. c(TRUE, TRUE)) so that they will be kept.

    DF1[, c(TRUE, TRUE, colnames(DF1)[-c(1:2)] %in% List1$Group)]
    

    Solution 2

    Or cbind the first two columns after matching for the group columns.

    cbind(DF1[,1:2], DF1[,colnames(DF1) %in% List1$Group])
    

    Output

               Name      Key group1 group3
    1   XAS /// HUA test1234     10      8
    2 MPA1 /// AAS2 test4553      8      4
    3          MPAS test3341      5      5
    4         SSPA1 test2142      5      8
    5        MAS61A test4722      6      4