Search code examples
stata

Stata Data cleaning


I need some help with Stata data transformation.

I have a survey, where the user can answer with "no response" which has been coded to integer 98. The variables can be of different data types. I need to get the number of "no response"/98 by a user into a separate variable.

I attached the dataset sample:

UserN   Q1    Q2       Q3       Q4    Q5          Q6      NewCreatedColumn
            
User1    11 "male"   "12:55pm"  98  "Answer1"   "other"     1
User2    98 "female" "1:00am"   98  "AnswerX"   "Batman"    2
User3    16 "male"   "1:00am"   34  "other"     "superman"  0
User4    98 "female" "1:00am"   98  "other"      "Dog"      2
User5    66 "male"   "1:00am"   98  "Life"       "Cat"      1

This would have been fairly easy in python, with each user in the dataframe is a list and you can scan for integer 98 in the list.

Is there an equivalent in Stata?

Sample Data


Solution

  • Thanks for the data example, improved below to become reproducible code. See also help dataex within Stata (or search dataex in an ancient Stata).

    clear 
    input str5 UserN   Q1  str7 (Q2       Q3)   Q4 str8 (Q5 Q6)      NewCreatedColumn
    User1    11 "male"   "12:55pm"  98  "Answer1"   "other"     1
    User2    98 "female" "1:00am"   98  "AnswerX"   "Batman"    2
    User3    16 "male"   "1:00am"   34  "other"     "superman"  0
    User4    98 "female" "1:00am"   98  "other"      "Dog"      2
    User5    66 "male"   "1:00am"   98  "Life"       "Cat"      1
    end 
    
    ds Q* , has(type numeric)
    egen wanted = anycount(`r(varlist)'), values(98)
    

    For counting the string foo, a loop will do it

    ds Q*, has(type string) 
    gen WANTED = 0 
    quietly foreach v in `r(varlist)' { 
        replace WANTED = WANTED + (`v' == "foo")  
    }