Search code examples
stringdataframeloopsautomationstata

Stata test if string contains same character


I want to automatically test if the string contains only one type of character, with the result in a true/false variable "check"

input str11 contactno 
"aaaaaaaaaaa"
"bbbbbbbbbbb"
"aaaaaaaaaab"
end

my attempt

gen check = .
//loop through dataset
local db =_N
forval x = 1/`db'{
dis as error "obs `x'"
//get first character in string
local f = substr(contactno, 1, 1) in `x' 
//loop through each character in string
capture drop check_*
forvalues i = 1/11 {
    quietly gen check_`i'=.
    local j = substr(contactno, `i', 1) in `x'

    //Tag characters that match
    if "`j'" == "`f'"  {
    local y = 1
    replace check_`i'= 1 in `x'
        } 
    else  {
    local y= 0
    replace check_`i'= 0 in `x'
    }
    
}

Expected results the first two observations should be true and the third false.


Solution

  • You can achieve this in one line of code as follows:

    1. Take the first character of contactno.
    2. Find all instances of this character in contactno and replace with an empty string (i.e., "").
    3. Test whether the resulting string is empty.
    gen check = missing(subinstr(contactno,substr(contactno,1,1),"",.))
    
    
         +---------------------+
         |   contactno   check |
         |---------------------|
      1. | aaaaaaaaaaa       1 |
      2. | bbbbbbbbbbb       1 |
      3. | aaaaaaaaaab       0 |
         +---------------------+
    

    So we are leveraging the fact that if all characters are not equal to the first character, then the string cannot contain only one (type of) character.