Search code examples
statadata-manipulation

Summing two groups under the same Categorical variable


I have a string variable called, "categories", which looks as below:

However, as you can see, some observations were listed under "Category 1" and others under "category 1", but I essentially want to sum them together where the total for category 1 would be=(3686 + 36)= 3722.

categories|      Freq.     Percent        Cum.
------------+-----------------------------------
 Category 1 |      3,686       10.53       10.53
 category 1 |         36        0.10       10.63
category 10 |         54        0.15       10.79
category 11 |      1,122        3.21       13.99
 category 2 |        615        1.76       15.75
 category 3 |     15,333       43.80       59.55
 category 4 |     12,694       36.26       95.81
 category 5 |        234        0.67       96.48
 category 6 |        110        0.31       96.79
 category 7 |        983        2.81       99.60
 category 8 |         35        0.10       99.70
 category 9 |        105        0.30      100.00

Solution

  • From this I guess that your unnamed variable is string but not consistently assigned.

    replace whatever = lower(whatever) 
    

    would be one of several ways to map Category 1 to category 1. Better yet to work with

    gen betteryet = real(word(whatever, 2)) 
    

    as the wording category is not helpful and if you have categories 1 to 11 you might as well see them in that order.