I have a data frame in Stata that has an ID variable with some rows missing id values. I want to systematically assign values to these rows with missing values by assigning ids that increase in value. For example, this is my data frame now:
id | name | city |
---|---|---|
Amy | Seattle | |
James | Oakland | |
Fred | New York | |
003 | John | Denver |
004 | David | Miami |
005 | Mary | Chicago |
I would like code that would allow me to fill in the id rows that are blank with numbers that increase in ascending order so that I have a data frame that looks like this:
id | name | city |
---|---|---|
001 | Amy | Seattle |
002 | James | Oakland |
003 | Fred | New York |
004 | John | Denver |
005 | David | Miami |
006 | Mary | Chicago |
Is there a way to do this?
This works with your data example:
* Example generated by -dataex-. For more info, type help dataex
clear
input str5 id str8(name city)
"" "Amy" "Seattle"
"" "James" "Oakland"
"" "Fred" "New York"
"003" "John" "Denver"
"004" "David" "Miami"
"005" "Mary" "Chicago"
end
gen str3 newid = strofreal(_n, "%03.0f")
list, sep(0)
+--------------------------------+
| id name city newid |
|--------------------------------|
1. | Amy Seattle 001 |
2. | James Oakland 002 |
3. | Fred New York 003 |
4. | 003 John Denver 004 |
5. | 004 David Miami 005 |
6. | 005 Mary Chicago 006 |
+--------------------------------+
I have deliberately not overwritten your identifier id
, which should not be done unless you are absolutely sure that it is what you want.
Also, you showed a blank in your question for the identifier in the first three observations ("rows" is spreadsheet-speak!), and leading zeros otherwise, which together strongly imply that it is a string variable. Stata tends be to more flexible when an identifier is numeric, easy enough in this case using destring
.