I'm doing some basic data processing stuff with Stata. However, I get stuck on how to keep the observations of the other variables matching, after modifying a variable's value label.
For example, say, I have the raw dataset as follows:
var1 var2 var3
US 1000 15
China 500 20
UK 800 10
The var1 is a string variable. I need to convert it into a numeric one. So I typed
encode var1, gen(country) // where the new variable country is numeric
The variables that I want for future work are var2, var3 and country.
When the new variable is generated, it automatically has a value label. To be specific, if I click on US in the column of country, it shows 3 in the cell on top of the column.
var1 var2 var3 country
US 1000 15 US
China 500 20 China
UK 800 10 UK
So far it's good. I mean for each row the observations match.
However, this automatically generated value label is not exactly what I want. What I need is to make 1 denote UK, 2 for US and 3 for China. So I modified it by:
label define country 1 "UK" 2 "US" 3 "China", modify
As shown below, I get the value labels that I want. Like, a click on China gives me 3.
var1 var2 var3 country
US 1000 15 China
China 500 20 UK
UK 800 10 US
But the point is, as you may already notice, the order of the observations for country is arranged alphabetically. It makes the corresponding values of var2 and var3 for each country to be wrong.
That is exactly where I could not get through. Since the columns that I finally need are var2, var3 and country, I'd like to have each country with its corresponding values of var2, var3 and as well the customized value labels.
I tried sort, order and a couple of commands but none gave me the result. I don't know what command I should use or there's something wrong in my earlier work.
Using the label()
option of encode
, is one way:
clear
set more off
input ///
str10 var1 var2 var3
US 1000 15
China 500 20
UK 800 10
end
label define lblcountry 1 "UK" 2 "US" 3 "China"
encode var1, generate(country) label(lblcountry)
list
list, nolabel
See also the user-written command sencode
(super-encode), which can do more: ssc describe sencode
.