Search code examples
stringlabelstataencodenumeric

Using encode command to convert string variable to numeric


I have a variable named education, with 0 = primary 1 = secondary 2 = tertiary.

I used encode command to change it from string to numeric, whilst maintaining the labels by:

label define edulabel 0 "primary" 1 "secondary" 2 "tertiary"
encode education, generate(education_n) label(edulabel) 

But when I actually browse the dataset I can see that the newly generated education_n stores the data as 3 for primary, 4 for secondary and 5 for tertiary.

Why is this the case and how do I fix it to be stored as 0s, 1s and 2s?


Solution

  • This is both hard and easy to answer precisely. It's hard because we can't see your data and you don't give a data example.

    It's easy because there is a generic answer. Stata is seeing in your data something other than the exact text you've defined for your labels, which is why it is adding three new labels.

    Some possibilities are

    1. Leading and/or trailing spaces. Push your string variable through trim() first.

    2. Although your report contradicts this, any use of upper case in your string variable would be enough for Stata to decide that the string values are not an exact match. There is no intelligence in the code to decide what you mean other than what you say.