Search code examples
stata

Drop a specific character from string responses


I have a string variable and some of the responses have an extra character at the beginning. The character in question is a constant character in all cases. The variable is ICD-code. For example, instead of G23 I have DG23.

Is there a way in Stata to remove the excess D character?

My data looks like this

ID diag
1 DZ456
2 DG32
3 DY258
4 DD35
5 DS321
6 DD21
7 DA123

Solution

  • For basic information in this territory, consult help string functions.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte d str5 diag
    1 "DZ456"
    2 "DG32" 
    3 "DY258"
    4 "DD35" 
    5 "DS321"
    6 "DD21" 
    7 "DA123"
    end
    
    replace diag = substr(diag, 2, .) if substr(diag, 1, 1) == "D"
    
    list 
    
         +----------+
         | d   diag |
         |----------|
      1. | 1   Z456 |
      2. | 2    G32 |
      3. | 3   Y258 |
      4. | 4    D35 |
      5. | 5   S321 |
         |----------|
      6. | 6    D21 |
      7. | 7   A123 |
         +----------+