Search code examples
rdataframemergeinner-join

Why is my merge returning more rows than I expected?


First, I'm very sorry for posting such a question. However, I'm a little bit confused about getting unexpected output. I want to merge two data frames.

x <- data.frame(id=c(1,1,1,2,2,2,3,3,3),
                vr1=c(1,2,3,4,5,6,7,8,9))
y <- data.frame(id=c(1,1,1,2,2,2,3,3,3),
                vr2=c(1,2,3,4,5,6,7,8,9))
z <- merge(x, y, by="id")

I'm expecting the output would be the following:

  id vr1 vr2
1  1   1   1
2  1   2   2
3  1   3   3
4  2   4   4
5  2   5   5
6  2   6   6
7  3   7   7
8  3   8   8
9  3   9   9

However, I got the following. What is my mistake?

   id vr1 vr2
1   1   1   1
2   1   1   2
3   1   1   3
4   1   2   1
5   1   2   2
6   1   2   3
7   1   3   1
8   1   3   2
9   1   3   3
10  2   4   4
11  2   4   5
12  2   4   6
13  2   5   4
14  2   5   5
15  2   5   6
16  2   6   4
17  2   6   5
18  2   6   6
19  3   7   7
20  3   7   8
21  3   7   9
22  3   8   7
23  3   8   8
24  3   8   9
25  3   9   7
26  3   9   8
27  3   9   9

Solution

  • Each id of 1 in x matches each id 1 in y so the merge generates 9 rows for id 1.

    If there are the same number of rows as in the question and the id values correspond as in the question the cbind(x, y) will give the desired result shown.

    cbind(x, y)
    ##   id vr1 id vr2
    ## 1  1   1  1   1
    ## 2  1   2  1   2
    ## 3  1   3  1   3
    ## 4  2   4  2   4
    ## 5  2   5  2   5
    ## 6  2   6  2   6
    ## 7  3   7  3   7
    ## 8  3   8  3   8
    ## 9  3   9  3   9
    

    Another possibility is that you want to match the ith id 1 in x to just the ith id 1 in y. (For other examples you may need to use the all= argument of merge but for this example all values of that argument give the same result.) In this case

    x2 <- transform(x, seq = ave(id, id, FUN = seq))
    y2 <- transform(y, seq = ave(id, id, FUN = seq))
    
    merge(x2, y2, by = c("id", "seq"))
    ##   id seq vr1 vr2
    ## 1  1   1   1   1
    ## 2  1   2   2   2
    ## 3  1   3   3   3
    ## 4  2   1   4   4
    ## 5  2   2   5   5
    ## 6  2   3   6   6
    ## 7  3   1   7   7
    ## 8  3   2   8   8
    ## 9  3   3   9   9