Search code examples
mergepanelstata

How can I merge two data sets with ID variation in stata


I have following two data sets.

The first one from children looks like this.

ID year Q1 Q2 Q3 Q4 ....

101 2014 1 2 2 2

101 2016 1 2 2 2

101 2017 1 2 2 2

101 2018 1 2 2 2

401 2014 1 2 2 2

401 2015 1 2 3 3

401 2016 1 2 2 2

401 2017 1 2 1 1

401 2018 1 2 2 2

402 2014 1 1 0 3

402 2015 1 1 2 2

402 2016 1 1 2 2

402 2017 1 1 3 3

402 2018 1 1 2 3

Here's the second one from their parents.

ID year Q101 Q102

1 2014 1 3

1 2015 1 3

1 2016 1 3

1 2017 1 2

1 2018 1 2

2 2014 2 .

2 2015 1 2

2 2016 . .

2 2017 1 3

2 2018 2 .

4 2014 1 3

4 2015 1 3

4 2016 1 3

4 2017 1 3

4 2018 1 3

So the parent data ID can be matched to the children data ID deleted last two digits. It seems that parent ID 4 has two children.

I tried merge 1:m ID using kids data as the master data set. but it didn't work.

Thank you.


Solution

  • Getting good answers is made more likely by (a) attempting code and showing what you tried and (b) giving data in the form of code anybody using Stata can run. The code here follows from editing your post and is close to what you could get directly by using dataex as explained in the Stata tag wiki or indeed at help dataex in an up-to-date Stata or one in which you installed dataex from SSC.

    clear 
    input ID year Q1 Q2 Q3 Q4 
    101 2014 1 2 2 2
    101 2016 1 2 2 2
    101 2017 1 2 2 2
    101 2018 1 2 2 2
    401 2014 1 2 2 2
    401 2015 1 2 3 3
    401 2016 1 2 2 2
    401 2017 1 2 1 1
    401 2018 1 2 2 2
    402 2014 1 1 0 3
    402 2015 1 1 2 2
    402 2016 1 1 2 2
    402 2017 1 1 3 3
    402 2018 1 1 2 3
    end 
    
    gen IDP = floor(ID/100)
    save children 
    
    clear 
    input ID year Q101 Q102
    1 2014 1 3
    1 2015 1 3
    1 2016 1 3
    1 2017 1 2
    1 2018 1 2
    2 2014 2 .
    2 2015 1 2
    2 2016 . .
    2 2017 1 3
    2 2018 2 .
    4 2014 1 3
    4 2015 1 3
    4 2016 1 3
    4 2017 1 3
    4 2018 1 3
    end 
    
    rename ID IDP 
    
    merge 1:m IDP year using children 
    
    list 
    
         +----------------------------------------------------------------------+
         | IDP   year   Q101   Q102    ID   Q1   Q2   Q3   Q4            _merge |
         |----------------------------------------------------------------------|
      1. |   1   2014      1      3   101    1    2    2    2       matched (3) |
      2. |   1   2015      1      3     .    .    .    .    .   master only (1) |
      3. |   1   2016      1      3   101    1    2    2    2       matched (3) |
      4. |   1   2017      1      2   101    1    2    2    2       matched (3) |
      5. |   1   2018      1      2   101    1    2    2    2       matched (3) |
         |----------------------------------------------------------------------|
      6. |   2   2014      2      .     .    .    .    .    .   master only (1) |
      7. |   2   2015      1      2     .    .    .    .    .   master only (1) |
      8. |   2   2016      .      .     .    .    .    .    .   master only (1) |
      9. |   2   2017      1      3     .    .    .    .    .   master only (1) |
     10. |   2   2018      2      .     .    .    .    .    .   master only (1) |
         |----------------------------------------------------------------------|
     11. |   4   2014      1      3   401    1    2    2    2       matched (3) |
     12. |   4   2015      1      3   401    1    2    3    3       matched (3) |
     13. |   4   2016      1      3   402    1    1    2    2       matched (3) |
     14. |   4   2017      1      3   401    1    2    1    1       matched (3) |
     15. |   4   2018      1      3   402    1    1    2    3       matched (3) |
         |----------------------------------------------------------------------|
     16. |   4   2014      1      3   402    1    1    0    3       matched (3) |
     17. |   4   2015      1      3   402    1    1    2    2       matched (3) |
     18. |   4   2016      1      3   401    1    2    2    2       matched (3) |
     19. |   4   2017      1      3   402    1    1    3    3       matched (3) |
     20. |   4   2018      1      3   401    1    2    2    2       matched (3) |
         +----------------------------------------------------------------------+
    

    As far as the merge is concerned the essentials are identifiers with the same name(s) in both datasets and the correct pattern for merging. The parent identifier is only implied by the children dataset.