Search code examples
statadistinct-values

Save list of distinct values of a variable in another variable


I have data at the country-year-z level, where z is a categorical variable that can take(say) 10 different values (for each country-year). Each combination of country-year-z is unique in the dataset.

I would like to obtain a dataset at the country-year level, with a new (string) variable containing all distinct values of z.

For instance let's say I have the following data:

country     year    z
A           2000    1
A           2001    1
A           2001    2
A           2001    4
A           2002    2
A           2002    5
B           2001    7
B           2001    8
B           2002    4
B           2002    5
B           2002    9
B           2003    3
B           2003    4
B           2005    1

I would like to get the following data:

country     year    z_distinct
A           2000    1
A           2001    1 2
A           2002    2 5
B           2001    7 8
B           2002    4 5 9
B           2003    3 4
B           2003    4

Solution

  • Here's another way to do it, perhaps more direct. If z is already a string variable the string() calls should both be omitted.

    clear 
    input str1 country year z
    A 2000 1
    A 2001 1
    A 2001 2
    A 2001 4
    A 2002 2
    A 2002 5
    B 2001 7
    B 2001 8
    B 2002 4
    B 2002 5
    B 2002 9
    B 2003 3
    B 2003 4
    B 2005 1
    end 
    
    bysort country year (z) : gen values = string(z[1]) 
    by country year : replace values = values[_n-1] + " " + string(z) if z != z[_n-1] & _n > 1 
    by country year : keep if _n == _N 
    drop z 
    
    list , sepby(country) 
         +-------------------------+
         | country   year   values |
         |-------------------------|
      1. |       A   2000        1 |
      2. |       A   2001    1 2 4 |
      3. |       A   2002      2 5 |
         |-------------------------|
      4. |       B   2001      7 8 |
      5. |       B   2002    4 5 9 |
      6. |       B   2003      3 4 |
      7. |       B   2005        1 |
         +-------------------------+