Search code examples
rdata-manipulation

How do I assign values from one data frame to another based on row value using R?


I have the following datasets DF1 and DF2. I am looking to assign values from Col3 from DF2 in the order it's in to rows in DF1. Please note that from As and Cs get unique values from Col3 but all Bs get one value, nothing specific, but the first one that is available, in this example it is 11 as the As exhausted values 1-9. The values from Col3 are to be assigned to DF1 in the order they are in(ascending) for as long as there are rows left in DF1 that need a value. (Please see desired output).

DF1

Col1 Col2
99 A1
99 A2
99 A3
99 A4
99 A5
99 B1
99 B2
98 B3
98 B4
99 C1
99 C2

DF2

Col3
1
3
5
7
9
11
13
15
17
19
21
23

Desired Output

Col1 Col2 Col3
99 A1 1
99 A2 3
99 A3 5
99 A4 7
99 A5 9
99 B1 11
99 B2 11
98 B3 13
98 B4 13
99 C1 15
99 C2 17

This is part of a very large data set and I usually do this in excel manually. I am running into a wall in R and could really use the help. Any input would be greatly appreciated. Thank you!


Solution

  • Using dplyr,

    library(dplyr)
    DF1 %>%
      mutate(
        ltr1 = substr(Col2, 1, 1),
        rn = cumsum(ltr1 != "B" | (lag(ltr1, default=ltr1[1]) != "B" | Col1 != lag(Col1, default=Col1[1]))),
        Col3 = DF2$Col3[rn]
      ) %>%
      select(-ltr1, -rn)
    #      Col1 Col2 Col3
    # 1      99   A1    1
    # 2      99   A2    3
    # 3      99   A3    5
    # 4      99   A4    7
    # 5      99   A5    9
    # 6      99   B1   11
    # 7      99   B2   11
    # 8  **98**   B3   13
    # 9  **98**   B4   13
    # 10     99   C1   15
    # 11     99   C2   17
    

    The reason I went with literal row-number extraction from DF2 is that we have no other "join" criteria.

    Walk-through:

    • cumsum(.) is an incrementer of sorts: when the condition is true, then rn is incremented. This should sequence by 1 for every row that does not contain a "B*" in Col2.
    • ltr1 != "B" | lag(..) != "B" ensures that we won't allow cumsum to increment when in a "B" row or if the previous row is a "B" row.
    • added Col1 != lag(Col1) adds that a change in Col1 results in incrementing the row number (rn) used in DF2.
    • (the use of rn as a name is completely arbitrary, meant to be a throw-away name that is removed later)