Search code examples
stata

Replacing values in specific variable for specific rows


I have a data frame in Stata that has an ID variable with some rows missing id values. I want to systematically assign values to these rows with missing values by assigning ids that increase in value. For example, this is my data frame now:

id name city
Amy Seattle
James Oakland
Fred New York
003 John Denver
004 David Miami
005 Mary Chicago

I would like code that would allow me to fill in the id rows that are blank with numbers that increase in ascending order so that I have a data frame that looks like this:

id name city
001 Amy Seattle
002 James Oakland
003 Fred New York
004 John Denver
005 David Miami
006 Mary Chicago

Is there a way to do this?


Solution

  • This works with your data example:

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 id str8(name city)
    ""    "Amy"   "Seattle" 
    ""    "James" "Oakland" 
    ""    "Fred"  "New York"
    "003" "John"  "Denver"  
    "004" "David" "Miami"   
    "005" "Mary"  "Chicago" 
    end
    
    gen str3 newid = strofreal(_n, "%03.0f")
    
    list, sep(0)
    
         +--------------------------------+
         |  id    name       city   newid |
         |--------------------------------|
      1. |         Amy    Seattle     001 |
      2. |       James    Oakland     002 |
      3. |        Fred   New York     003 |
      4. | 003    John     Denver     004 |
      5. | 004   David      Miami     005 |
      6. | 005    Mary    Chicago     006 |
         +--------------------------------+
    
    

    I have deliberately not overwritten your identifier id, which should not be done unless you are absolutely sure that it is what you want.

    Also, you showed a blank in your question for the identifier in the first three observations ("rows" is spreadsheet-speak!), and leading zeros otherwise, which together strongly imply that it is a string variable. Stata tends be to more flexible when an identifier is numeric, easy enough in this case using destring.