Merge and Left Join keeps duplicating line items in R

I am struggling with this issue that I am unable to fix! I have to data-frames ("Master" and " "Hours"). The 'Master' df has many columns but the ones in particular are as follows below:

Master

StoreNumber ... MON TUE WED THU FRI SAT SUN 
     1           0   0   0   0   0   0   0
     2           0   0   0   0   0   0   0
     3           0   0   0   0   0   0   0
     ...

NB: The Master df has many columns in between StoreNumber as the days of the week and holds a lot of data (about 3000 stores)

Hours

BranchNumber   Day   TimeDiff
     1         MON   7.50
     1         TUE   6.00
     1         WED   8.50
     1         THU   2.00
     1         FRI   1.00
     1         SAT   2.50
     3         MON   7.50
     3         TUE   6.00
     3         WED   8.50
     3         THU   2.00
     3         FRI   1.00
     3         SAT   2.50
     3         SUN   5.00
    ...

So the idea is that I am trying to match the 'Hours' BrandNumber with the 'Master' StoreNumber. Once there is a match then it matched the Day column from the 'Hours' table with the Days of the week in the 'Masters' Table...It will do this for each row and then populate days of the week with the corresponding value in 'TimeDiff' column...if the store and branch number has no match (Like StoreNumber 2) then it should skip that row and move onto the next. Another condition, like BranchNumber '1' there is no data for SUNDAY so in the 'Master' table the SUNDAY cell should be left as 0...this should work for any day of the week.

The output should be the 'Master' Table but complete with all the days of the week data from the 'Hours' Table. In this example, it should look like:

    StoreNumber ... MON    TUE   WED   THU   FRI   SAT   SUN 
         1          7.50  6.00  8.50  2.00  1.00  2.50    0
         2           0      0     0     0     0     0     0
         3          7.50  6.00  8.50  2.00  1.00  2.50  5.00
     ...

The code I have tried is semi-working but I am not sure if its the correct approach. The biggest problem I am getting is that its duplicating the rows expect from the first row. For instance, the output looks more like this.

StoreNumber
    1
    2
    2
    3
    3
    4
    4
    5
    5 
    5

all are duplicating and some tripling and every 87 columns are identical...however the days of the week of the duplicated row are all 0's.

 merged <- Master %>% select(-c("MON","TUE","WED","THU","FRI","SAT","SUN")) %>%
 left_join(
 Hours %>% pivot_wider(names_from = Day, values_from = TimeDiff),
 by = c('StoreNumber' = 'BranchNumber'))

 merged <- merged %>% replace(is.na(.),0)

Sorry for the long question, this issue has been bugging me a while so any help/advice will be grateful

Solution

If I understand correctly, the Master table has many columns and only the columns MON to SUN need to be updated.

Here are two approaches which uses data.table's ability to update in a join. Only the relevant columns are modified by reference, i.e., without copying the whole data object. It avoids to reshape (or pivot) the Master table forth and back.

Variant 1

library(data.table)
days <- names(Master)[which(names(Master) == "MON") + (0:6)]
setDT(Master)[, (days) := lapply(.SD, as.double), .SDcols = days]
for (d in days) {
  Master[Hours, on =.(StoreNumber = BranchNumber), (d) := TimeDiff[d == Day], by = .EACHI]
}
Master[]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5   0
2:           2        b 0.0   0 0.0   0   0 0.0   0
3:           3        c 7.5   6 8.5   2   1 2.5   5

Explanation

days contains the names of the columns.
days <- names(Master)[which(names(Master) == "MON") + (0:6)] is equivalent to
days <- c("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")
data.table requires consistent data types when parts of a column are updated. The day columns in Master are initialized to integer zero but TimeDiff in Hours is numeric. Therefore, the day columns in Master are coerced to double before updating.
The for loop iterates over each day column and performs the update join for this column. For each match (by = .EACHI), the Timediff for the relevant day is picked.

In order to verify that Master has not been copied we can call

data.table::address(Master)

before and after the operation: The address of Master has not changed.

Variant 2

This approach is a bit leaner. It also uses an update join but it is different to variant 1 as it reshapes (or pivots) Hours from long to wide format and removes the days columns from Master instead of coercing a bunch of integer zeroes to type numeric:

    library(data.table)
    days <- c("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")
    Hours_wide <- dcast(setDT(Hours)[, Day := ordered(Day, levels = days)], BranchNumber ~ Day)
    setDT(Master)[, (days) := NULL][
      Hours_wide, on = .(StoreNumber = BranchNumber), (days) := mget(paste0("i.", days))]
    Master[]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5  NA
2:           2        b  NA  NA  NA  NA  NA  NA  NA
3:           3        c 7.5   6 8.5   2   1 2.5   5

Note that missing elements are now initialized to / indicated by NA which is much easier to detect, IMHO. If required, the NAs can be turned into another numeric value by

Master[, (days) := lapply(.SD, nafill, fill = 0), .SDcols = days][]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5   0
2:           2        b 0.0   0 0.0   0   0 0.0   0
3:           3        c 7.5   6 8.5   2   1 2.5   5

This approach uses mget(paste0("i.", days)) to pick the days columns from Hours. If there are columns with the same name in both data.tables in a join, we can distinguish the columns by prepending the column names by a x. and i., resp. Thus, x.MON refers to the MON column from the first data.table which is Master in this case and i.MON refers to the MON column from the second data.table which is Hours_wide. mget() takes the column names as character strings and returns a list of the values of the respective columns.

Variant 2 - Edit 1

Above code can be simplified by

setDT(Master)[, (days) := NULL][
  Hours_wide, on = .(StoreNumber = BranchNumber), (days) := mget(days)][]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5  NA
2:           2        b  NA  NA  NA  NA  NA  NA  NA
3:           3        c 7.5   6 8.5   2   1 2.5   5

Because setDT(Master)[, (days) := NULL] already has removed the columns MON to SUN from Master there is no ambiguity on column names. Thus, the column names MON to SUN can be used without prepending them by i. as the only columns named MON to SUN are in Hours_wide.

Variant 2 - Edit 2

With development version 1.14.1 as of 2021-05-10, a new interface for programming on data.table has been added (see item 10 in NEWS and the new vignette programming on data.table). Instead of get()/ mget() the new env argument is recommended:

library(data.table) # development version 1.14.1 used
days <- c("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")
Hours_wide <- dcast(setDT(Hours)[, Day := ordered(Day, levels = days)], BranchNumber ~ Day)
setDT(Master)[, (days) := NULL][
  Hours_wide, on = .(StoreNumber = BranchNumber), (days) := s,
  env = list(s = as.list(days))][]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5  NA
2:           2        b  NA  NA  NA  NA  NA  NA  NA
3:           3        c 7.5   6 8.5   2   1 2.5   5

Variant 3: `env` parameter and `fcoalesce()`

OP's expected result shows 0 instead of NA. With the Variants 2 above, this was was achieved by a separate update step using nafill().

This separate update step can be avoided by using the fcoalesce() function in the update join:

library(data.table) # development version 1.14.1 used
days <- c("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")
Hours_wide <- dcast(setDT(Hours)[, Day := ordered(Day, levels = days)], BranchNumber ~ Day)
setDT(Master)[, (days) := lapply(.SD, as.double), .SDcols = days][
  Hours_wide, on = .(StoreNumber = BranchNumber), (days) := lapply(s, fcoalesce, 0), 
  env = list(s = as.list(paste0("i.", days)))][]

   StoreNumber OtherCol MON TUE WED THU FRI SAT SUN
1:           1        a 7.5   6 8.5   2   1 2.5   0
2:           2        b 0.0   0 0.0   0   0 0.0   0
3:           3        c 7.5   6 8.5   2   1 2.5   5

Data

library(data.table)

Master <- fread("
StoreNumber OtherCol MON TUE WED THU FRI SAT SUN 
     1         a      0   0   0   0   0   0   0
     2         b      0   0   0   0   0   0   0
     3         c      0   0   0   0   0   0   0
", data.table = FALSE)
 
Hours <- fread("
BranchNumber   Day   TimeDiff
     1         MON   7.50
     1         TUE   6.00
     1         WED   8.50
     1         THU   2.00
     1         FRI   1.00
     1         SAT   2.50
     3         MON   7.50
     3         TUE   6.00
     3         WED   8.50
     3         THU   2.00
     3         FRI   1.00
     3         SAT   2.50
     3         SUN   5.00
", data.table = FALSE)