Search code examples
rdataframemergeunique

How to avoid problems with more rows of data when using merge() in R?


I have gone through numerous posts for this problem and I haven't been able to produce the data frame that I want.

I have two data frames that I would like to merge. However, more rows of data were produced after using the merge function.

Ultimately there should be 6 rows (for this example), but all of commands are giving 36 rows. Is it because there might be duplicate since I am using 2 columns for the merge function?

These are my data and here's what I have already tried.

a <- structure(list(month = c(1L, 1L, 1L, 1L, 1L, 1L), site = c("Port", 
"Port", "Port", "Port", "Port", "Port"), max = c(17.1530908785179, 
17.6490466820266, 19.8794824562496, 16.6000416246619, 15.8144630183894, 
14.4950690162599)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

b <- structure(list(month = c(1, 1, 1, 1, 1, 1), site = c("Port", 
"Port", "Port", "Port", "Port", "Port"), slope = c(0.189564181246092, 
0.142842264473357, 0.135918209518515, 0.152899782597735, 0.223283613118016, 
0.177886719032959)), row.names = c(NA, 6L), class = "data.frame")

What I've tried:

merge(a, b, by=c("month", "site"))

merge(a, b, by=c("month", "site"), all=TRUE)

unique(a) %>%
merge(b, by=c("month", "site"), all =TRUE)

left_join(a, b, by=c("month", "site"))

right_join(a, b, by=c("month", "site"))

I am not sure what I am missing. Any pointers on where the problem is and how to fix it would be really helpful. Thank you.


Solution

  • The problem is, that you merge by month and site which are "1" or "port" for each entry in the data frames. The merge command now takes the first entry of the data frame b and checks if there are any matches for month and site in the data frame a. Because every entry in the data frame a is a match (again every entry of site and month is the same) it merges the first entry of the data frame b to all the entries in data frame a. It does this 6 times with every entry in the data frame b. Hence you have a data frame with 36 entries.

    If you just want to slap the data frames together I would use cbind:

    cbind(a,b[, 3])