I have 2 data frames (DF1
& DF2
) and 1 would like to join them together by a unique value called "acc_num"
. In DF2
, payment was made twice by acc_num A
and thrice by B
. Data frames are as follows.
DF1:
acc_num total_use sales
A 433 145
A NA 2
A NA 18
B 149 32
DF2:
acc payment
A 150
A 98
B 44
B 15
B 10
My desired output is:
acc_num total_use sales payment
A 433 145 150
A NA 2 98
A NA 18 NA
B 149 32 44
B NA NA 15
B NA NA 10
I've tried full_join
and merge
but the output was not as desired. I couldn't work this out as I'm still a beginner in R, and haven't found the solution to this.
Example of the code I used was
test_full_join <- DF1 %>% full_join(DF2, by = c("acc_num" = "acc"))
The displayed output was:
acc_num total_use sales payment
A 433 145 150
A 433 145 98
A NA 2 150
A NA 2 98
A NA 18 150
A NA 18 98
B 149 32 44
B 149 32 15
B 149 32 10
This is contrary to my desired output as at the end,
my concern is to get the total sum of total_use
, sales
and payment
.
This output will definitely give me wrong interpretation
for data visualization later on.
We may need to do a join by row_number()
based on 'acc_num'
library(dplyr)
df1 %>%
group_by(acc_num) %>%
mutate(grpind = row_number()) %>%
full_join(df2 %>%
group_by(acc_num = acc) %>%
mutate(grpind = row_number())) %>%
select(acc_num, total_use, sales, payment)
# A tibble: 6 x 4
# Groups: acc_num [2]
# acc_num total_use sales payment
# <chr> <int> <int> <int>
#1 A 433 145 150
#2 A NA 2 98
#3 A NA 18 NA
#4 B 149 32 44
#5 B NA NA 15
#6 B NA NA 10
df1 <- structure(list(acc_num = c("A", "A", "A", "B"), total_use = c(433L,
NA, NA, 149L), sales = c(145L, 2L, 18L, 32L)), class = "data.frame",
row.names = c(NA,
-4L))
df2 <- structure(list(acc = c("A", "A", "B", "B", "B"), payment = c(150L,
98L, 44L, 15L, 10L)), class = "data.frame", row.names = c(NA,
-5L))