Updated question to include extra columns that pass from df1 to output, as well as add that the main df has 7 million rows.
I have two dataframes, similar to Pandas SO Question
And I need to multiply them where the names match. I'm not sure how to accomplish this cleanly. is there an apply function that does this easily?
DF1: (there are 16 columns of extra data, 7 million rows long)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 100
bbf ad3 sample2 200
dd a sample2 300
33 3rf sample3 25
ddd dd sample3 50
dd dd sample3 40
DF2:
Name Value
sample1 1
sample2 0.5
sample3 2
OUTPUT: (With the additional 16 columns that aren't shown)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 50
bbf ad3 sample2 100
dd a sample2 150
33 3rf sample3 50
ddd dd sample3 100
dd dd sample3 80
We can join both the dataframes using a left_join
or an inner_join
and then multiply the corresponding Value
columns with one another. Using dplyr
that can be done as :
library(dplyr)
inner_join(df1, df2, by = 'Name') %>%
mutate(Value = Value.x * Value.y) %>%
select(names(df1))
# Name Value
#1 sample1 50
#2 sample1 100
#3 sample1 75
#4 sample2 50
#5 sample2 100
#6 sample2 150
#7 sample3 50
#8 sample3 100
#9 sample3 80
data
df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("sample1", "sample2", "sample3"), class = "factor"),
Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L
)), class = "data.frame", row.names = c(NA, -9L))
df2 <- structure(list(Name = structure(1:3, .Label = c("sample1", "sample2",
"sample3"), class = "factor"), Value = c(1, 0.5, 2)), class = "data.frame",
row.names = c(NA, -3L))