I have a rather simple task which I am struggling with in Stata.
I have three variables SicTwo1 SicTwo2 SicTwo3
, which are numeric (e.g. "12", "25", and "16")
I now want to concatenate them into a new variable SicAndSicAndSic
, BUT they shall be ordered from lowest to highest value (e.g. "121625"), ideally with a separator (e.g. "12&16&25")
I tried this code:
gen NumSics = 0
replace NumSics = NumSics + 1 if !missing(SicTwo1)
replace NumSics = NumSics + 1 if !missing(SicTwo2) & SicTwo1 != SicTwo2
replace NumSics = NumSics + 1 if !missing(SicTwo3) & SicTwo1 != SicTwo2 & SicTwo1 != SicTwo3 & SicTwo2 != SicTwo3
sort SicTwo1 SicTwo2 SicTwo3
gen SicAndSic1 = string(SicTwo1)
gen SicAndSic2 = string(SicTwo1) + "&" + string(SicTwo2)
gen SicAndSic3 = string(SicTwo1) + "&" + string(SicTwo2) + "&" + string(SicTwo3)
gen SicAndSic = ""
replace SicAndSic = SicAndSic1 if NumSics == 1
replace SicAndSic = SicAndSic2 if NumSics == 2
replace SicAndSic = SicAndSic3 if NumSics == 3
But it does not sort the variables, and just puts them next to each other.
See https://www.stata-journal.com/article.html?article=pr0046 for one way to sort variables within observations (rows).
clear
input SicTwo1 SicTwo2 SicTwo3
12 16 25
23 12 11
99 88 11
end
rowsort SicTwo?, gen(S1 S2 S3)
egen wanted = concat(S?) , p(&)
list
+-------------------------------------------------------+
| SicTwo1 SicTwo2 SicTwo3 S1 S2 S3 wanted |
|-------------------------------------------------------|
1. | 12 16 25 12 16 25 12&16&25 |
2. | 23 12 11 11 12 23 11&12&23 |
3. | 99 88 11 11 88 99 11&88&99 |
+-------------------------------------------------------+
Your code shows a misunderstanding of sort
, which sorts observations by values of variables, but emphatically does not sort within observations -- which is precisely what does rowsort
does, with the proviso that the original variables are unchanged, and the results go in new variables.
Your variables are stated to be numeric, so any missing values will by default be sorted to high. If you want something else, you need to spell out what that is.