Search code examples
if-statementrowsstata

How to add a row where there is a disruption in series of numbers in Stata


I'm attempting to format a table of 40 different age-race-sex strata to be inputted into R-INLA and noticed that it's important to include all strata (even if they are not present in a county). These should be zeros. However, at this point my table only contains records for strata that are not empty. I can identify places where strata are missing for each county by looking at my strata variable and finding the breaks in the series 1 through 40 (marked with a red x in the image below).

RINLA

In these places (marked by the red x) I need to add the missing rows and fill in the corresponding county code, strata code, population=0, and the correct corresponding race, sex, age code for the strata.

If I can figure out a way to add an empty row in the spaces with the red Xs from the image, and correctly assign the strata code (and county code) to these empty/missing rows, I am able to populate the rest of the values with the code below:

recode race = 1 & sex= 1 & age =4 if strata = 4 ...etc

I'm wondering if there is a way to add the missing rows using an if statement that considers the fact that there are supposed to be forty strata for each county code. It would be ideal if this could populate the correct county code and strata code as well!

Dataex sample data:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float OID str5 fips_statecounty double population byte(race sex age) float strata
  1 ""       672 1 1 1  1
  2 ""      1048 1 1 2  2
  3 ""       883 1 1 3  3
  4 ""      1129 1 1 4  4
  5 ""       574 1 2 1  5
  6 ""       986 1 2 2  6
  7 ""       899 1 2 3  7
  8 ""      1820 1 2 4  8
  9 ""        96 2 1 1  9
 10 ""       142 2 1 2 10
 11 ""        81 2 1 3 11
 12 ""        99 2 1 4 12
 13 ""        71 2 2 1 13
 14 ""       125 2 2 2 14
 15 ""       103 2 2 3 15
 16 ""       162 2 2 4 16
 17 ""        31 3 1 1 17
 18 ""        32 3 1 2 18
 19 ""        18 3 1 3 19
 20 ""        31 3 1 4 20
 21 ""        22 3 2 1 21
 22 ""        28 3 2 2 22
 23 ""        28 3 2 3 23
 24 ""        44 3 2 4 24
 25 ""        20 4 1 1 25
 26 ""        24 4 1 2 26
 27 ""        21 4 1 3 27
 28 ""        43 4 1 4 28
 29 ""        19 4 2 1 29
 30 ""        26 4 2 2 30
 31 ""        24 4 2 3 31
 32 ""        58 4 2 4 32
 33 ""         6 5 1 1 33
 34 ""        11 5 1 2 34
 35 ""        13 5 1 3 35
 36 ""         7 5 1 4 36
 37 ""         7 5 2 1 37
 38 ""         9 5 2 2 38
 39 ""        10 5 2 3 39
 40 ""        11 5 2 4 40
 41 "01001"  239 1 1 1  1
 42 "01001"  464 1 1 2  2
 43 "01001"  314 1 1 3  3
 44 "01001"  232 1 1 4  4
 45 "01001"  284 1 2 1  5
 46 "01001"  580 1 2 2  6
 47 "01001"  392 1 2 3  7
 48 "01001"  440 1 2 4  8
 49 "01001"   41 2 1 1  9
 50 "01001"   38 2 1 2 10
 51 "01001"   23 2 1 3 11
 52 "01001"   26 2 1 4 12
 53 "01001"   34 2 2 1 13
 54 "01001"   52 2 2 2 14
 55 "01001"   40 2 2 3 15
 56 "01001"   50 2 2 4 16
 57 "01001"    4 3 1 1 17
 58 "01001"    2 3 1 2 18
 59 "01001"    3 3 1 3 19
 60 "01001"    6 3 2 1 21
 61 "01001"    4 3 2 2 22
 62 "01001"    6 3 2 3 23
 63 "01001"    4 3 2 4 24
 64 "01001"    1 4 1 4 28
 65 "01003" 1424 1 1 1  1
 66 "01003" 2415 1 1 2  2
 67 "01003" 1680 1 1 3  3
 68 "01003" 1823 1 1 4  4
 69 "01003" 1545 1 2 1  5
 70 "01003" 2592 1 2 2  6
 71 "01003" 1916 1 2 3  7
 72 "01003" 2527 1 2 4  8
 73 "01003"   68 2 1 1  9
 74 "01003"   82 2 1 2 10
 75 "01003"   52 2 1 3 11
 76 "01003"   54 2 1 4 12
 77 "01003"   72 2 2 1 13
 78 "01003"  129 2 2 2 14
 79 "01003"   81 2 2 3 15
 80 "01003"  106 2 2 4 16
 81 "01003"   10 3 1 1 17
 82 "01003"   14 3 1 2 18
 83 "01003"    8 3 1 3 19
 84 "01003"    4 3 1 4 20
 85 "01003"    8 3 2 1 21
 86 "01003"   14 3 2 2 22
 87 "01003"   17 3 2 3 23
 88 "01003"   10 3 2 4 24
 89 "01003"    4 4 1 1 25
 90 "01003"    1 4 1 3 27
 91 "01003"    2 4 1 4 28
 92 "01003"    2 4 2 1 29
 93 "01003"    3 4 2 2 30
 94 "01003"    4 4 2 3 31
 95 "01003"   10 4 2 4 32
 96 "01003"    5 5 1 1 33
 97 "01003"    4 5 1 2 34
 98 "01003"    3 5 1 3 35
 99 "01003"    5 5 1 4 36
100 "01003"    5 5 2 2 38
end
label values race race
label values sex sex

Solution

  • My answer to your previous question

    Nested for-loop: error variable already defined

    detailed how to create a minimal dataset with all strata present. Therefore you should just merge that with your main dataset and replace missings on the absent strata with whatever your other software expects, zeros it seems.

    The complication most obvious at this point is you need to factor in a county variable. I can't see any information on how many counties you have in your dataset, which may affect what is practical. You should be able to break down the preparation into: first, prepare a minimal county dataset with identifiers only; then merge that with a complete strata dataset.