I would like to find a way to create an indicator flag across rows such that once a criterion has been met, the flag persists across all cases within a group.
In the sample data below, I have a variable _p
that defines statistical significance of the comparison of values in _mar
across levels of _m
. I also have a grouping variable _g
that indicates the comparisons are made within a group.
The variables _f_s
and _f_n
represent the end result that I would like to have.
clear
input _mar _m _p _g _f_s _f_n
2.99 0 0.00000 0 1 0
3.03 1 0.00000 0 1 0
3.05 2 0.00000 0 1 1
3.06 3 0.22179 0 0 1
3.07 4 0.18044 0 0 1
3.07 5 0.58009 0 0 1
3.06 6 0.40620 0 0 1
3.06 7 0.47257 0 0 1
3.06 8 0.91196 0 0 1
3.05 9 0.68560 0 0 1
2.65 0 0.00000 1 1 0
2.70 1 0.00000 1 1 0
2.73 2 0.00103 1 1 0
2.75 3 0.00944 1 1 1
2.75 4 0.64713 1 0 1
2.76 5 0.55476 1 0 1
2.77 6 0.32807 1 0 1
2.78 7 0.03271 1 0 1
2.78 8 0.00219 1 0 1
2.79 9 0.57361 1 0 1
end
I would like to use the flag to indicate in a graph where statistical significance "stops" and ignore other comparisons values.
Below you can also find the code that I have attempted up to this point:
Snippet 1 - graph works, lines are structured as desired
snapshot save, label("import")
snapshot list
twoway ///
(line _mar _m if _g == 0 & _f_s==1, lcolor(orange) lpattern(solid)) ///
(line _mar _m if _g == 0 & _f_n==1, lcolor(orange) lpattern(dash )) ///
(scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
///
(line _mar _m if _g == 1 & _f_s==1, lcolor(blue*2) lpattern(solid)) ///
(line _mar _m if _g == 1 & _f_n==1, lcolor(blue*2) lpattern(dash )) ///
(scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off) ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(good)
Snippet 2 - graph does not work, lines are not structured as desired
snapshot restore 1
sort _g _m
gen x_f_s = (_p <= .05)
replace x_f_s = 0 if x_f_s ==1 & x_f_s[_n-1]==0 & x_f_s[_n+1]==0
replace x_f_s = 1 if _m == 0
gen x_f_n = x_f_s == 0
replace x_f_n = 1 if x_f_s ==1 & x_f_s[_n+1]==0
/***** the created flags are not correct *****/
list, sepby(_g)
twoway ///
(line _mar _m if _g == 0 & x_f_s==1, lcolor(orange) lpattern(solid)) ///
(line _mar _m if _g == 0 & x_f_n==1, lcolor(orange) lpattern(dash )) ///
(scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
///
(line _mar _m if _g == 1 & x_f_s==1, lcolor(blue*2) lpattern(solid)) ///
(line _mar _m if _g == 1 & x_f_n==1, lcolor(blue*2) lpattern(dash )) ///
(scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off) ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(not_good)
The variables that I have tried to calculate are noted with x_f_s
and x_f_n
.
The flags work when there are no subsequent statistical comparisons that happen to be significant. However, when there is a significant comparison after the initial "stop" the plotting does not work.
There should also be a second flag that indicates where "non-significance" starts. This would carry forward in a similar way to the first flag.
I am using solid and dashed lines to indicate where significance exists, and then stops.
Ultimately, I would like to create flags within groups for plotting purposes.
This is how I would do it:
bysort _g (_m): generate x_f_s = (_p <= .05)
bysort _g (_m): generate x_f_n = x_f_s == 0
list, sepby(_g)
+-------------------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s x_f_n |
|-------------------------------------------------------|
1. | 2.99 0 0 0 1 0 1 0 |
2. | 3.03 1 0 0 1 0 1 0 |
3. | 3.05 2 0 0 1 1 1 0 |
4. | 3.06 3 .22179 0 0 1 0 1 |
5. | 3.07 4 .18044 0 0 1 0 1 |
6. | 3.07 5 .58009 0 0 1 0 1 |
7. | 3.06 6 .4062 0 0 1 0 1 |
8. | 3.06 7 .47257 0 0 1 0 1 |
9. | 3.06 8 .91196 0 0 1 0 1 |
10. | 3.05 9 .6856 0 0 1 0 1 |
|-------------------------------------------------------|
11. | 2.65 0 0 1 1 0 1 0 |
12. | 2.7 1 0 1 1 0 1 0 |
13. | 2.73 2 .00103 1 1 0 1 0 |
14. | 2.75 3 .00944 1 1 1 1 0 |
15. | 2.75 4 .64713 1 0 1 0 1 |
16. | 2.76 5 .55476 1 0 1 0 1 |
17. | 2.77 6 .32807 1 0 1 0 1 |
18. | 2.78 7 .03271 1 0 1 1 0 |
19. | 2.78 8 .00219 1 0 1 1 0 |
20. | 2.79 9 .57361 1 0 1 0 1 |
+-------------------------------------------------------+
This is how you can automate the application of the first rule:
bysort _g (_m): generate x_f_s = (_p <= .05)
clonevar tag = x_f_s
local i 1
while `i'== 1 {
capture noisily {
bysort _g (_m): assert x_f_s == 0 if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
}
if _rc {
bysort _g (_m): replace x_f_s = 0 if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
drop tag
clonevar tag = x_f_s
}
else local i 0
}
drop tag
Which produces the desired output for x_f_s
:
list
+-----------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s |
|-----------------------------------------------|
1. | 2.99 0 0 0 1 0 1 |
2. | 3.03 1 0 0 1 0 1 |
3. | 3.05 2 0 0 1 1 1 |
4. | 3.06 3 .22179 0 0 1 0 |
5. | 3.07 4 .18044 0 0 1 0 |
|-----------------------------------------------|
6. | 3.07 5 .58009 0 0 1 0 |
7. | 3.06 6 .4062 0 0 1 0 |
8. | 3.06 7 .47257 0 0 1 0 |
9. | 3.06 8 .91196 0 0 1 0 |
10. | 3.05 9 .6856 0 0 1 0 |
|-----------------------------------------------|
11. | 2.65 0 0 1 1 0 1 |
12. | 2.7 1 0 1 1 0 1 |
13. | 2.73 2 .00103 1 1 0 1 |
14. | 2.75 3 .00944 1 1 1 1 |
15. | 2.75 4 .64713 1 0 1 0 |
|-----------------------------------------------|
16. | 2.76 5 .55476 1 0 1 0 |
17. | 2.77 6 .32807 1 0 1 0 |
18. | 2.78 7 .03271 1 0 1 0 |
19. | 2.78 8 .00219 1 0 1 0 |
20. | 2.79 9 .57361 1 0 1 0 |
+-----------------------------------------------+
The second rule is more straightforward as you only need to replace just before the cut-off point:
bysort _g (_m): generate x_f_n = x_f_s == 0
bysort _g (_m): replace x_f_n = 1 if x_f_s == 1 & x_f_s[_n+1]== 0
list
+-------------------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s x_f_n |
|-------------------------------------------------------|
1. | 2.99 0 0 0 1 0 1 0 |
2. | 3.03 1 0 0 1 0 1 0 |
3. | 3.05 2 0 0 1 1 1 1 |
4. | 3.06 3 .22179 0 0 1 0 1 |
5. | 3.07 4 .18044 0 0 1 0 1 |
|-------------------------------------------------------|
6. | 3.07 5 .58009 0 0 1 0 1 |
7. | 3.06 6 .4062 0 0 1 0 1 |
8. | 3.06 7 .47257 0 0 1 0 1 |
9. | 3.06 8 .91196 0 0 1 0 1 |
10. | 3.05 9 .6856 0 0 1 0 1 |
|-------------------------------------------------------|
11. | 2.65 0 0 1 1 0 1 0 |
12. | 2.7 1 0 1 1 0 1 0 |
13. | 2.73 2 .00103 1 1 0 1 0 |
14. | 2.75 3 .00944 1 1 1 1 1 |
15. | 2.75 4 .64713 1 0 1 0 1 |
|-------------------------------------------------------|
16. | 2.76 5 .55476 1 0 1 0 1 |
17. | 2.77 6 .32807 1 0 1 0 1 |
18. | 2.78 7 .03271 1 0 1 0 1 |
19. | 2.78 8 .00219 1 0 1 0 1 |
20. | 2.79 9 .57361 1 0 1 0 1 |
+-------------------------------------------------------+