Search code examples
countstata

Count occurrences across different columns with the same information


I have a dataset that looks like this

ID S1 S2 S3 year
1  1  2     1999
1  1  2     2000
2  1        1999
2  1        2000
3  1        1999
3  2        2000
4  1  2  3  1999
4  3  1  2  2000
5  1  2     1999
5  1  3     2000

where the first column is the subject ID, S1 to S3 are location identifiers and year refers to either 1999 or 2000. I would like to count all the occurrences in which a subject stays in the same location in both years (using Stata). That is, for subject 1, this value is 2, for subject 2, it is 1, for subject 3 it is 0, for subject 4 it is 3 and for subject 5 it is 1.


Solution

  • I assume your blanks are really numeric missing. This should also work if S1 S2 S3 are in fact string and the blanks are empty strings. The example that 1 2 3 and 3 1 2 are equivalent rules out various simpler solutions of just counting similarities over S1 S2 S3 in a loop.

    clear 
    input ID S1 S2 S3 year
    1  1  2  .  1999
    1  1  2  .  2000
    2  1  .  .  1999
    2  1  .  .  2000
    3  1  .  .  1999
    3  2  .  .  2000
    4  1  2  3  1999
    4  3  1  2  2000
    5  1  2  .  1999
    5  1  3  .  2000
    end 
    
    reshape long S, i(ID year) j(which) 
    bysort ID (S) : gen count = sum(S == S[_n-1] & !missing(S)) 
    by ID: replace count = count[_N] 
    reshape wide S, i(ID year) j(which)  
    
    list, sepby(ID) 
    
         +----------------------------------+
         | ID   year   S1   S2   S3   count |
         |----------------------------------|
      1. |  1   1999    1    2    .       2 |
      2. |  1   2000    1    2    .       2 |
         |----------------------------------|
      3. |  2   1999    1    .    .       1 |
      4. |  2   2000    1    .    .       1 |
         |----------------------------------|
      5. |  3   1999    1    .    .       0 |
      6. |  3   2000    2    .    .       0 |
         |----------------------------------|
      7. |  4   1999    1    2    3       3 |
      8. |  4   2000    3    1    2       3 |
         |----------------------------------|
      9. |  5   1999    1    2    .       1 |
     10. |  5   2000    1    3    .       1 |
         +----------------------------------+