I have data at the country
-year
-z
level, where z
is a categorical variable that can take(say) 10 different values (for each country-year). Each combination of country
-year
-z
is unique in the dataset.
I would like to obtain a dataset at the country-year level, with a new (string) variable containing all distinct values of z
.
For instance let's say I have the following data:
country year z
A 2000 1
A 2001 1
A 2001 2
A 2001 4
A 2002 2
A 2002 5
B 2001 7
B 2001 8
B 2002 4
B 2002 5
B 2002 9
B 2003 3
B 2003 4
B 2005 1
I would like to get the following data:
country year z_distinct
A 2000 1
A 2001 1 2
A 2002 2 5
B 2001 7 8
B 2002 4 5 9
B 2003 3 4
B 2003 4
Here's another way to do it, perhaps more direct. If z
is already a string variable the string()
calls should both be omitted.
clear
input str1 country year z
A 2000 1
A 2001 1
A 2001 2
A 2001 4
A 2002 2
A 2002 5
B 2001 7
B 2001 8
B 2002 4
B 2002 5
B 2002 9
B 2003 3
B 2003 4
B 2005 1
end
bysort country year (z) : gen values = string(z[1])
by country year : replace values = values[_n-1] + " " + string(z) if z != z[_n-1] & _n > 1
by country year : keep if _n == _N
drop z
list , sepby(country)
+-------------------------+
| country year values |
|-------------------------|
1. | A 2000 1 |
2. | A 2001 1 2 4 |
3. | A 2002 2 5 |
|-------------------------|
4. | B 2001 7 8 |
5. | B 2002 4 5 9 |
6. | B 2003 3 4 |
7. | B 2005 1 |
+-------------------------+