I have the following datasets DF1 and DF2. I am looking to assign values from Col3 from DF2 in the order it's in to rows in DF1. Please note that from As and Cs get unique values from Col3 but all Bs get one value, nothing specific, but the first one that is available, in this example it is 11 as the As exhausted values 1-9. The values from Col3 are to be assigned to DF1 in the order they are in(ascending) for as long as there are rows left in DF1 that need a value. (Please see desired output).
DF1
Col1 | Col2 |
---|---|
99 | A1 |
99 | A2 |
99 | A3 |
99 | A4 |
99 | A5 |
99 | B1 |
99 | B2 |
98 | B3 |
98 | B4 |
99 | C1 |
99 | C2 |
DF2
Col3 |
---|
1 |
3 |
5 |
7 |
9 |
11 |
13 |
15 |
17 |
19 |
21 |
23 |
Desired Output
Col1 | Col2 | Col3 |
---|---|---|
99 | A1 | 1 |
99 | A2 | 3 |
99 | A3 | 5 |
99 | A4 | 7 |
99 | A5 | 9 |
99 | B1 | 11 |
99 | B2 | 11 |
98 | B3 | 13 |
98 | B4 | 13 |
99 | C1 | 15 |
99 | C2 | 17 |
This is part of a very large data set and I usually do this in excel manually. I am running into a wall in R and could really use the help. Any input would be greatly appreciated. Thank you!
Using dplyr,
library(dplyr)
DF1 %>%
mutate(
ltr1 = substr(Col2, 1, 1),
rn = cumsum(ltr1 != "B" | (lag(ltr1, default=ltr1[1]) != "B" | Col1 != lag(Col1, default=Col1[1]))),
Col3 = DF2$Col3[rn]
) %>%
select(-ltr1, -rn)
# Col1 Col2 Col3
# 1 99 A1 1
# 2 99 A2 3
# 3 99 A3 5
# 4 99 A4 7
# 5 99 A5 9
# 6 99 B1 11
# 7 99 B2 11
# 8 **98** B3 13
# 9 **98** B4 13
# 10 99 C1 15
# 11 99 C2 17
The reason I went with literal row-number extraction from DF2
is that we have no other "join" criteria.
Walk-through:
cumsum(.)
is an incrementer of sorts: when the condition is true, then rn
is incremented. This should sequence by 1 for every row that does not contain a "B*"
in Col2
.ltr1 != "B" | lag(..) != "B"
ensures that we won't allow cumsum
to increment when in a "B"
row or if the previous row is a "B"
row.Col1 != lag(Col1)
adds that a change in Col1
results in incrementing the row number (rn
) used in DF2
.rn
as a name is completely arbitrary, meant to be a throw-away name that is removed later)