I am looking to find a way to use a data frame that has two columns (the first one being the name, and the other being numerical values), and if a name matches with that of a column below (DF1), then to keep that column.
For example, I have a data frame called List1:
Group | Number |
---|---|
group1 | 0.5 |
group3 | 0.6 |
group4 | 0.8 |
group5 | 0.9 |
I want to compare the strings in the first column of List1, and if one of the entries in that column matches with a column name in DF1 (exception is I want to keep the first two columns of DF1 regardless), and if there is a name match between List1$Group and a column name, then to keep it in DF1.
I have this DF1:
Name | Key | group1 | group2 | group3 |
---|---|---|---|---|
XAS /// HUA | test1234 | 10 | 10 | 8 |
MPA1 /// AAS2 | test4553 | 8 | 7 | 4 |
MPAS | test3341 | 5 | 5 | 5 |
SSPA1 | test2142 | 5 | 6 | 8 |
MAS61A | test4722 | 6 | 7 | 4 |
So the goal is to get this new data frame after applying the code (group 2 is dropped because it is not in List1):
Name | Key | group1 | group3 |
---|---|---|---|
XAS /// HUA | test1234 | 10 | 8 |
MPA1 /// AAS2 | test4553 | 8 | 4 |
MPAS | test3341 | 5 | 5 |
SSPA1 | test2142 | 5 | 8 |
MAS61A | test4722 | 6 | 4 |
How can this be done for large data sets? Any thoughts are greatly appreciated!
# DF1
Name <- c("XAS /// HUA", "MPA1 /// AAS2", "MPAS", "SSPA1", "MAS61A")
Key <- c("test1234", "test4553", "test3341", "test2142", "test4722")
group1 <- c(10, 8, 5, 5, 6)
group2 <- c(10, 7, 5, 6, 7)
group3 <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Key, group1, group2, group3)
Group <- c("group1", "group3", "group4", "group5")
Number <- c(0.5, 0.6, 0.8, 0.9)
List1 <- data.frame(Group, Number)
You can use the %in%
operator to compare two sets of strings.
Since you want to keep the first two columns, we need to set the logical values to TRUE
(i.e. c(TRUE, TRUE)
) so that they will be kept.
DF1[, c(TRUE, TRUE, colnames(DF1)[-c(1:2)] %in% List1$Group)]
Or cbind
the first two columns after matching for the group
columns.
cbind(DF1[,1:2], DF1[,colnames(DF1) %in% List1$Group])
Name Key group1 group3
1 XAS /// HUA test1234 10 8
2 MPA1 /// AAS2 test4553 8 4
3 MPAS test3341 5 5
4 SSPA1 test2142 5 8
5 MAS61A test4722 6 4