Search code examples
stata

Merge databases in Stata and create new vars based on identity and value of merged data


I have two databases, DB1 and DB2, that I would like to merge, but I am having difficulties. I would like help in determining what Stata calls what I am trying to do.

DB1 has about 1000 observations and looks like:

   +----------+
   | date   b |
   |----------|
1. | 1      7 |
2. | 2      6 |
3. | 3      7 |
   +----------+

DB2 consists of 65 IDs each with about 1000 observations. It looks something like:

   +--------------+
   | date  id   b |
   |--------------|
1. | 1     1    4 |
2. | 2     1    4 |
3. | 3     1    5 |
4. | 1     2    9 |
5. | 2     2    8 |
6. | 3     2    7 |
7. | 1     3    1 |
8. | 2     3    2 |
9. | 3     3    1 |  
   +--------------+

I would like to merge DB2 with DB1 so that the ultimate database looks like:

   +------------------------------+
   | date   b  id1b id2b  id3b ...|
   |------------------------------|
1. | 1      7    4    9     1  ...|
2. | 2      6    4    8     2  ...|
3. | 3      7    5    7     1  ...|
   +------------------------------+

I have been reading about the merge command but that alone will not create my ultimate database.

Can you direct me materials that will help me with this? What do you call what I am trying to do? I feel like I need to command Stata to generate new variables.


Solution

  • @William Lisowski is right. This gets you what you ask for, short of a easy rename. Whether it is the best structure for your analyses is unclear: most work with similar data would be easier with a further reshape long.

    clear 
    input date b 
    1 7 
    2 6 
    3 7 
    end 
    save DB1 
    
    clear 
    input date id b 
    1 1 4 
    2 1 4 
    3 1 5 
    1 2 9 
    2 2 8 
    3 2 7 
    1 3 1 
    2 3 2 
    3 3 1 
    end 
    reshape wide b, j(id) i(date) 
    merge 1:1 date using DB1 
    

    Indeed, I would much more usually do something like this to get a long structure directly:

    clear 
    input date b 
    1 7 
    2 6 
    3 7 
    end 
    rename b B 
    save DB1 , replace 
    clear 
    input date id b 
    1 1 4 
    2 1 4 
    3 1 5 
    1 2 9 
    2 2 8 
    3 2 7 
    1 3 1 
    2 3 2 
    3 3 1 
    end 
    merge m:1 date using DB1