Search code examples
rdataframestring-matchingsubtraction

Subtracting values from 2 data frames by rows based on a shared string name in a column


I want to subtract from two different dataframes. I have a data frame with information in it, and another that has a "reference point".

I have one data frame, DF1, that has this information.

Name Key BCD12A ACA1 DEX2
X_1_group1 test1234 10 10 8
X_2_group1 test4553 8 7 4
X_2_group2 test3341 5 5 5
X_2_group1 test2142 5 6 8
X_1_group2 test4722 6 7 4

Using DF1, I want to subtract the values in a specific row to the row of DF2 (below), but based on a numerical match from the first column.

For example, based on DF2 below, I have a string in DF2$Name as X_1_group1. In DF1, the first row has that same string in the first column. Since the name matches, I will subtract the values in that row of DF1 from that row in DF2. Note that there may be more than one string in DF1 with the same name, so I want to be able to match the names that way.

Name Key BCD12A ACA1 DEX2
X_1_group1 X_1_group1_A 2 2 1
X_2_group1 X_1_group1_B 1 3 1
X_1_group2 X_1_group1_C 2 2 3
X_2_group2 X_1_group1_D 2 1 2

DF3 (Which is the goal) - The second column, through different will stay the same that matches DF1.

Name Key BCD12A ACA1 DEX2
X_1_group1 test1234 8 8 7
X_2_group1 test4553 7 4 3
X_2_group2 test3341 3 4 3
X_2_group1 test2142 4 3 7
X_1_group2 test4722 4 5 1

Any help on name matching with two data frames for a subtraction would be much appreciated. I also want to generalize this that there will be a lot more columns than in this example, so anything that would consider that portion would help a lot too! Thanks for any help!

# DF1
Name <- c("X_1_group1", "X_2_group1", "X_2_group2", "X_2_group1", "X1_group2")
Key <- c("test1234", "test4553", "test3341", "test2142", "test4722")
BCD12A <- c(10, 8, 5, 5, 6)
ACA1 <- c(10, 7, 5, 6, 7)
DEX2 <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Key, BCD12A, ACA1, DEX2)

# DF2
Name <- c("X_1_group1", "X_2_group1", "X_1_group2", "X_2_group2")
Key <- c("X_1_group1_A", "X_1_group1_B", "X_1_group1_C", "X_1_group1_D")
BCD12A <- c(2, 1, 2, 2)
ACA1 <- c(2, 3, 2, 1)
DEX2 <- c(1, 1, 3, 2)
DF2 <- data.frame(Name, Key, BCD12A, ACA1, DEX2)

Solution

  • We may use a join by 'Key' and subtract the corresponding numeric columns. With powerjoin, we have the option to get the difference (-) where there are conflicting names (common names)

    library(powerjoin)
    library(dplyr)
    DF1 %>%
        power_left_join(DF2 %>% select(-Key), by = "Name", conflict = `-`)
    

    -output

            Name      Key BCD12A ACA1 DEX2
    1 X_1_group1 test1234      8    8    7
    2 X_2_group1 test4553      7    4    3
    3 X_2_group2 test3341      3    4    3
    4 X_2_group1 test2142      4    3    7
    5 X_1_group2 test4722      4    5    1
    

    Or using data.table join

    library(data.table)
    nm1 <- names(DF1)[3:5]
    nm2 <- paste0("i.", nm1)
    setDT(DF1)[DF2, (nm1) := Map(`-`, mget(nm1), mget(nm2)), on = .(Name)]