Search code examples
stata

Create an indicator flag when condition has been met


I would like to find a way to create an indicator flag across rows such that once a criterion has been met, the flag persists across all cases within a group.

In the sample data below, I have a variable _p that defines statistical significance of the comparison of values in _mar across levels of _m. I also have a grouping variable _g that indicates the comparisons are made within a group.

The variables _f_s and _f_n represent the end result that I would like to have.

clear

input _mar _m  _p  _g  _f_s  _f_n 
2.99    0   0.00000    0   1  0       
3.03    1   0.00000    0   1  0       
3.05    2   0.00000    0   1  1       
3.06    3   0.22179    0   0  1       
3.07    4   0.18044    0   0  1       
3.07    5   0.58009    0   0  1       
3.06    6   0.40620    0   0  1       
3.06    7   0.47257    0   0  1       
3.06    8   0.91196    0   0  1       
3.05    9   0.68560    0   0  1       
2.65    0   0.00000    1   1  0       
2.70    1   0.00000    1   1  0       
2.73    2   0.00103    1   1  0       
2.75    3   0.00944    1   1  1       
2.75    4   0.64713    1   0  1       
2.76    5   0.55476    1   0  1       
2.77    6   0.32807    1   0  1       
2.78    7   0.03271    1   0  1       
2.78    8   0.00219    1   0  1       
2.79    9   0.57361    1   0  1              
end    

I would like to use the flag to indicate in a graph where statistical significance "stops" and ignore other comparisons values.

Below you can also find the code that I have attempted up to this point:

Snippet 1 - graph works, lines are structured as desired

 snapshot save, label("import")
 snapshot list 

 twoway ///
 (line _mar _m if _g == 0 & _f_s==1, lcolor(orange) lpattern(solid)) ///
 (line _mar _m if _g == 0 & _f_n==1, lcolor(orange) lpattern(dash )) ///
 (scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
  ///
 (line _mar _m if _g == 1 & _f_s==1, lcolor(blue*2) lpattern(solid)) ///
 (line _mar _m if _g == 1 & _f_n==1, lcolor(blue*2) lpattern(dash )) ///
 (scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off)   ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///      
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(good)

Snippet 2 - graph does not work, lines are not structured as desired

snapshot restore 1 

sort _g _m
gen x_f_s = (_p <= .05) 
replace x_f_s = 0 if x_f_s ==1 & x_f_s[_n-1]==0 & x_f_s[_n+1]==0
replace x_f_s = 1 if _m == 0
gen x_f_n = x_f_s == 0
replace x_f_n = 1 if x_f_s ==1 & x_f_s[_n+1]==0

/*****  the created flags are not correct  *****/
list, sepby(_g)

 twoway ///
 (line _mar _m if _g == 0 & x_f_s==1, lcolor(orange) lpattern(solid)) ///
 (line _mar _m if _g == 0 & x_f_n==1, lcolor(orange) lpattern(dash )) ///
 (scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
  ///
 (line _mar _m if _g == 1 & x_f_s==1, lcolor(blue*2) lpattern(solid)) ///
 (line _mar _m if _g == 1 & x_f_n==1, lcolor(blue*2) lpattern(dash )) ///
 (scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off)   ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///      
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(not_good)

The variables that I have tried to calculate are noted with x_f_s and x_f_n.

The flags work when there are no subsequent statistical comparisons that happen to be significant. However, when there is a significant comparison after the initial "stop" the plotting does not work.

There should also be a second flag that indicates where "non-significance" starts. This would carry forward in a similar way to the first flag.

I am using solid and dashed lines to indicate where significance exists, and then stops.

Ultimately, I would like to create flags within groups for plotting purposes.


Solution

  • This is how I would do it:

    bysort _g (_m): generate x_f_s = (_p <= .05) 
    bysort _g (_m): generate x_f_n = x_f_s == 0
    
    list, sepby(_g)
    
         +-------------------------------------------------------+
         | _mar   _m       _p   _g   _f_s   _f_n   x_f_s   x_f_n |
         |-------------------------------------------------------|
      1. | 2.99    0        0    0      1      0       1       0 |
      2. | 3.03    1        0    0      1      0       1       0 |
      3. | 3.05    2        0    0      1      1       1       0 |
      4. | 3.06    3   .22179    0      0      1       0       1 |
      5. | 3.07    4   .18044    0      0      1       0       1 |
      6. | 3.07    5   .58009    0      0      1       0       1 |
      7. | 3.06    6    .4062    0      0      1       0       1 |
      8. | 3.06    7   .47257    0      0      1       0       1 |
      9. | 3.06    8   .91196    0      0      1       0       1 |
     10. | 3.05    9    .6856    0      0      1       0       1 |
         |-------------------------------------------------------|
     11. | 2.65    0        0    1      1      0       1       0 |
     12. |  2.7    1        0    1      1      0       1       0 |
     13. | 2.73    2   .00103    1      1      0       1       0 |
     14. | 2.75    3   .00944    1      1      1       1       0 |
     15. | 2.75    4   .64713    1      0      1       0       1 |
     16. | 2.76    5   .55476    1      0      1       0       1 |
     17. | 2.77    6   .32807    1      0      1       0       1 |
     18. | 2.78    7   .03271    1      0      1       1       0 |
     19. | 2.78    8   .00219    1      0      1       1       0 |
     20. | 2.79    9   .57361    1      0      1       0       1 |
         +-------------------------------------------------------+
    

    This is how you can automate the application of the first rule:

    bysort _g (_m): generate x_f_s = (_p <= .05) 
    
    clonevar tag = x_f_s
    
    local i 1
    while `i'== 1 {
        capture noisily {
            bysort _g (_m): assert x_f_s == 0  if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
        }
        if _rc {
            bysort _g (_m): replace x_f_s = 0 if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
            drop tag
            clonevar tag = x_f_s                            
        }
        else local i 0
    }
    
    drop tag
    

    Which produces the desired output for x_f_s:

    list
    
         +-----------------------------------------------+
         | _mar   _m       _p   _g   _f_s   _f_n   x_f_s |
         |-----------------------------------------------|
      1. | 2.99    0        0    0      1      0       1 |
      2. | 3.03    1        0    0      1      0       1 |
      3. | 3.05    2        0    0      1      1       1 |
      4. | 3.06    3   .22179    0      0      1       0 |
      5. | 3.07    4   .18044    0      0      1       0 |
         |-----------------------------------------------|
      6. | 3.07    5   .58009    0      0      1       0 |
      7. | 3.06    6    .4062    0      0      1       0 |
      8. | 3.06    7   .47257    0      0      1       0 |
      9. | 3.06    8   .91196    0      0      1       0 |
     10. | 3.05    9    .6856    0      0      1       0 |
         |-----------------------------------------------|
     11. | 2.65    0        0    1      1      0       1 |
     12. |  2.7    1        0    1      1      0       1 |
     13. | 2.73    2   .00103    1      1      0       1 |
     14. | 2.75    3   .00944    1      1      1       1 |
     15. | 2.75    4   .64713    1      0      1       0 |
         |-----------------------------------------------|
     16. | 2.76    5   .55476    1      0      1       0 |
     17. | 2.77    6   .32807    1      0      1       0 |
     18. | 2.78    7   .03271    1      0      1       0 |
     19. | 2.78    8   .00219    1      0      1       0 |
     20. | 2.79    9   .57361    1      0      1       0 |
         +-----------------------------------------------+
    

    The second rule is more straightforward as you only need to replace just before the cut-off point:

    bysort _g (_m): generate x_f_n = x_f_s == 0
    bysort _g (_m): replace x_f_n = 1 if x_f_s == 1 & x_f_s[_n+1]== 0
    
    list
    
         +-------------------------------------------------------+
         | _mar   _m       _p   _g   _f_s   _f_n   x_f_s   x_f_n |
         |-------------------------------------------------------|
      1. | 2.99    0        0    0      1      0       1       0 |
      2. | 3.03    1        0    0      1      0       1       0 |
      3. | 3.05    2        0    0      1      1       1       1 |
      4. | 3.06    3   .22179    0      0      1       0       1 |
      5. | 3.07    4   .18044    0      0      1       0       1 |
         |-------------------------------------------------------|
      6. | 3.07    5   .58009    0      0      1       0       1 |
      7. | 3.06    6    .4062    0      0      1       0       1 |
      8. | 3.06    7   .47257    0      0      1       0       1 |
      9. | 3.06    8   .91196    0      0      1       0       1 |
     10. | 3.05    9    .6856    0      0      1       0       1 |
         |-------------------------------------------------------|
     11. | 2.65    0        0    1      1      0       1       0 |
     12. |  2.7    1        0    1      1      0       1       0 |
     13. | 2.73    2   .00103    1      1      0       1       0 |
     14. | 2.75    3   .00944    1      1      1       1       1 |
     15. | 2.75    4   .64713    1      0      1       0       1 |
         |-------------------------------------------------------|
     16. | 2.76    5   .55476    1      0      1       0       1 |
     17. | 2.77    6   .32807    1      0      1       0       1 |
     18. | 2.78    7   .03271    1      0      1       0       1 |
     19. | 2.78    8   .00219    1      0      1       0       1 |
     20. | 2.79    9   .57361    1      0      1       0       1 |
         +-------------------------------------------------------+