Search code examples
rdataframeinner-joincalculated-columns

How to use parent item usage and a component list to find component usage with usage date intact in R?


I have a data frame with item SKUs and associated quantities. Some of these SKUs are Parent SKUs that represent several component SKUs that I care about.

I need a solution that creates a data frame with the components and their associated quantities as separate line items with the date from the original parent SKU usage. My current solution collapses all parent SKU usage into one line and loses date information. Thank you for any help!

df1 <- data.frame(SKU = c("abc", "def", "ghi", "abc", "mno"), Qty = c(2,1,1,1,2), Date = c("1-1", "1-1", "1-2", "1-2", "1-2"))

df2 <- data.frame(Parent_SKU = c("def", "def", "mno"), Component = c("abc","jkl","abc"), Component_Qty = c(1,3,1))

Data Frame 1

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |

Data Frame 2

| Parent_SKU | Component | Component_Qty |
|  --------  | --------- | ------------- |
|    def     |    abc    |       1       |
|    def     |    jkl    |       3       |
|    mno     |    abc    |       1       |

Data Frame 3 (what I want)

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |
|   abc    |     1    |    1-1   |
|   jkl    |     3    |    1-1   |
|   abc    |     2    |    1-2   |

My initial attempt sums all parent SKUs into one line each, multiplies through to find component usage, and appends it to the main data frame. This solution is inelegant, loses date information, and collapses all parent SKUs into one line (I also care about the number of rows for an item).

library(tidyverse)

#create data frame with Parent_SKU usage
df4 <- df1 %>% filter(SKU %in% unique(df2$Parent_SKU)) %>% group_by(SKU) %>% summarize(Qty = sum(Qty))

#Rename column for joining
df4 <- df4 %>% rename("Parent_SKU"="SKU")

#Create new df with Parent_SKU Qty associated with components
df5 <- full_join(df4,df2)

#Turn Qty column into qty of component use
df5$Qty <- df5$Qty*df5$Component_Qty

#Rename component column for joining
df5 <- df5 %>% rename("SKU"="Component")

#Append component usage together with original data frame & get rid of non useful columns
df3 <- dplyr::bind_rows(df5, df1)
df3 <- df3[-c(1,4)]

Result:

|   SKU    |   Qty    |   Date   |
| -------- | -------- | -------- |
|   abc    |     2    |    1-1   |
|   def    |     1    |    1-1   |
|   ghi    |     1    |    1-2   |
|   abc    |     1    |    1-2   |
|   mno    |     2    |    1-2   |
|   abc    |     1    |    NA    |
|   jkl    |     3    |    NA    |
|   abc    |     2    |    NA    |

Solution

  • Using @Mark 's solution, but including the parent usage in calculating usage of component.

    library(dplyr)
            
    # join the dfs, get the info for the components, multiply through to get component usage, and remove component_qty column
    df3 <- inner_join(df1, df2, by = c("SKU" = "Parent_SKU")) |>
        select(SKU = Component, Component_Qty = Component_Qty, Date = Date, Qty = Qty) 
        %>% mutate(Qty = Qty*Component_Qty) %>% select(-Component_Qty)
    
    
    # append the component data to the end
    bind_rows(df1, df3)
    

    Output:

      SKU Qty Date
        1 abc   2  1-1
        2 def   1  1-1
        3 ghi   1  1-2
        4 abc   1  1-2
        5 mno   2  1-2
        6 abc   1  1-1
        7 jkl   3  1-1
        8 abc   2  1-2