Search code examples
statacategorical-datarecode

Recoding levels of categorical variable in Stata without ruining data


I have this variable which takes on these values:

     tab expenditure
   
                            Q11 |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
                  Afs 2500-5000 |         24        3.84        3.84
                  Afs 5000-7500 |         89       14.24       18.08
                 Afs 7500-10000 |        235       37.60       55.68
I don't know / refuse to answer |          9        1.44       57.12
             Less than Afs 2500 |          5        0.80       57.92
            More than Afs 10000 |        263       42.08      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

I would like to change the ordering, so the categories are not in alphabetical order. I tried using

label define expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", replace

I also tried using

recode expenditure (1 = 5) (2 = 1) (3 = 2) (4 = 3) (5 = 6) (6 = 4)

However, both methods just change the labels, not the underlying data, and now the data is all messed up (note the changes in frequencies, now only 24 observations for the "More than Afs 10000" category instead of 263 as before).

tab expenditure

                            Q11 |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
             Less than Afs 2500 |         89       14.24       14.24
                  Afs 2500-5000 |        235       37.60       51.84
                  Afs 5000-7500 |          9        1.44       53.28
                 Afs 7500-10000 |        263       42.08       95.36
            More than Afs 10000 |         24        3.84       99.20
I don't know / refuse to answer |          5        0.80      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

What's going on? What can I do to change this without affecting my underlying data?


Solution

  • If your expenditure variable were string, you could just use label define and encode like this:

    . clear 
    
    . input str31 expenditure int freq 
    
                             expenditure      freq
      1.                   "Afs 2500-5000"        24        
      2.                   "Afs 5000-7500"        89      
      3.                  "Afs 7500-10000"       235      
      4. "I don't know / refuse to answer"         9        
      5.              "Less than Afs 2500"         5        
      6.             "More than Afs 10000"       263   
      7. end 
    
    . label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"
    
    
    . encode expenditure, gen(expenditure2) label(expenditure)
    
    
    . label var expenditure2 "expenditure"
    
    
    . tab expenditure2 [fw=freq]
    
                        expenditure |      Freq.     Percent        Cum.
    --------------------------------+-----------------------------------
                 Less than Afs 2500 |          5        0.80        0.80
                      Afs 2500-5000 |         24        3.84        4.64
                      Afs 5000-7500 |         89       14.24       18.88
                     Afs 7500-10000 |        235       37.60       56.48
                More than Afs 10000 |        263       42.08       98.56
    I don't know / refuse to answer |          9        1.44      100.00
    --------------------------------+-----------------------------------
                              Total |        625      100.00
    

    It seems, however, that your variable is numeric, so it is as if you did this:

    . clear 
    
    . input byte expenditure int freq 
    
         expend~e      freq
      1. 1                       24        
      2. 2                       89      
      3. 3                      235      
      4. 4                        9        
      5. 5                        5        
      6. 6                      263   
      7. end 
    
    . label def expenditure 5 "Less than Afs 2500" 1 "Afs 2500-5000" 2 "Afs 5000-7500" 3 "Afs 7500-10000" 6 "More than Afs 10000" 4 "I don't know / refuse to answer"
    
    . label val expenditure expenditure 
    

    The problem now is that you need to re-define the value labels as well as apply recode.

    . recode expenditure 5=1 1=2 2=3 3=4 4=6 6=5 5=6 
    (6 changes made to expenditure)
    
    
    . tab expenditure [fw=freq]
    
                        expenditure |      Freq.     Percent        Cum.
    --------------------------------+-----------------------------------
                      Afs 2500-5000 |          5        0.80        0.80
                      Afs 5000-7500 |         24        3.84        4.64
                     Afs 7500-10000 |         89       14.24       18.88
    I don't know / refuse to answer |        235       37.60       56.48
                 Less than Afs 2500 |        263       42.08       98.56
                More than Afs 10000 |          9        1.44      100.00
    --------------------------------+-----------------------------------
                              Total |        625      100.00
    
    . label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", modify
    
    
    . tab expenditure [fw=freq]
    
                        expenditure |      Freq.     Percent        Cum.
    --------------------------------+-----------------------------------
                 Less than Afs 2500 |          5        0.80        0.80
                      Afs 2500-5000 |         24        3.84        4.64
                      Afs 5000-7500 |         89       14.24       18.88
                     Afs 7500-10000 |        235       37.60       56.48
                More than Afs 10000 |        263       42.08       98.56
    I don't know / refuse to answer |          9        1.44      100.00
    --------------------------------+-----------------------------------
                              Total |        625      100.00