Search code examples
foreachstatastata-macros

Analysis by group and extracting value from a variable for printing


I am doing a little by-group processing, running some regressions by school.

What I would like to do is customize my output somewhat so I can see what output belongs to which school. However, I can't seem to process the foreach or the forvalues in such a way to make it work. I have tried various iterations of foreach and forvalues with some success.

What I have been trying to do now is call the value of schoolname to be shown on the di line, but I haven't' been successful.

Below you can find some dummy data and code:

clear

input str6 studyid str30 schoolname y1gpa hsgpa a b c
VR2330  "Hot Dog University"  3.88696869  3.128562923 212.2027076 198.6369561 201.8520712
VR2330  "Hot Dog University"  3.724999751 4.14927266  200.2249981 197.2148641 190.8007133
VR2330  "Hot Dog University"  2.862368864 2.739375205 177.8104087 178.3566674 200.670764
VR2330  "Hot Dog University"  2.944155173 3.449033253 246.0577836 217.0256571 201.3599989
VR2330  "Hot Dog University"  3.027040023 2.774194849 179.7717585 208.3190507 201.1944748
VR2330  "Hot Dog University"  2.841508367 3.687799575 197.6369809 195.8034033 199.1525982
VR2330  "Hot Dog University"  2.709707669 2.258620921 147.0523958 247.4690088 215.5400833
VR2331  "The Berger Institute"  3.212822292 2.185146375 198.8197157 225.2337787 210.4646972
VR2331  "The Berger Institute"  2.304060034 2.241674897 188.3421993 186.9284032 207.108407
VR2331  "The Berger Institute"  3.339541832 3.106312279 209.7122346 193.2738859 207.9925428
VR2331  "The Berger Institute"  2.499369421 3.664498982 221.7819609 176.6067578 193.6349191
VR2331  "The Berger Institute"  2.58976085  2.604897762 201.4597068 189.7504268 193.0684748
VR2331  "The Berger Institute"  3.077416948 3.084996384 238.0112743 193.6023413 200.5245392
VR2331  "The Berger Institute"  3.595215292 3.47498973  196.0401919 205.2955727 204.7250124
VR2331  "The Berger Institute"  3.24943739  2.771259619 191.88872 179.8274715 210.3563047
VY1444  "Kale University" 3.58066891  3.765540136 185.2309378 198.1122011 196.1956994
VY1444  "Kale University" 2.620232242 3.079285234 163.3202145 195.7290603 205.682183
VY1444  "Kale University" 3.022673799 2.9914787 185.7449451 210.568389  206.960721
VY1444  "Kale University" 2.792861825 2.16564107  180.1691308 211.4182189 188.3452234
VY1444  "Kale University" 2.779154097 3.293620836 219.2595568 200.1849757 210.6425208
VY1444  "Kale University" 4.186316759 3.456717239 228.7297482 194.2097571 205.7079995
VY1444  "Kale University" 4.379739444 2.859316959 213.5641419 199.1315086 208.4406278
VY1444  "Kale University" 1.966028458 2.54365722  220.7757803 195.4262537 228.8124132
VY1444  "Kale University" 2.008067935 2.795116509 199.3403281 200.4161464 188.9522367
VZ4189  "Rice"  3.258253963 3.619015176 181.1053119 222.2819107 210.8807028
VZ4189  "Rice"  3.47515332  2.66431201  195.6496183 174.7512574 200.9326979
VZ4189  "Rice"  3.397466557 3.701428367 176.8322852 170.4327733 197.481968
VZ4189  "Rice"  3.141235215 3.26033076  187.7110626 187.5184942 215.002884
VZ4189  "Rice"  2.532078344 3.642275074 160.3208923 183.584604  194.770921
VZ4189  "Rice"  3.568638147 3.388113378 204.7815867 240.7565031 215.1194944
VZ4189  "Rice"  2.189863527 3.047948811 234.8225538 234.0024598 207.1882718
VZ4189  "Rice"  3.095726852 2.661160872 204.4226312 203.9618803 204.3683427
VZ4189  "Rice"  3.616748385 2.879665788 193.8070183 214.8352585 199.9727215
end

encode studyid, generate(school_id)
encode schoolname, generate(school_name)

sort school_id
egen _school = group(school_id)

tab1 _school schoolname
su _school, meanonly

forvalues _school = 1/`r(max)' { 

 di _n _dup(5)
 di "(start of analysis for `_school' ) "  _dup(60) "-" 
 di "I would like to have the actual ``schoolname`` here"

 regress y1gpa   hsgpa   if school_name == `_school'
 estimates store _mr

 regress y1gpa   hsgpa a b c   if school_name == `_school'
 estimates store _mf

 lrtest _mf _mr
 ftest _mf _mr

 test a b c 

}

Note: This question has also been cross-posted on Statalist


Solution

  • Some good answers on Statalist (particularly using levelsof), but to provide a solution using your method with some small tweaks:

    First, you don't need to generate both school_name and _school, as they are the same.

    Second, you need to store r(max) in a local and use it (rather than `r(max)', which is not valid) in your forvalues loop. (Or as pointed out in the comments, use `=r(max)' to evaluate r(max) and insert the result directly.)

    Third, you can use the extended macro functions to get the value label for school_name and display it (see help extended_fcn).

    encode studyid, generate(school_id)
    encode schoolname, generate(school_name)
    sort school_id
    
    su school_name, meanonly
    local nschool = r(max)
    
    forvalues s = 1/`nschool' { 
    
        local sch : label school_name `s'
        di _n _dup(5)
        di as result "(start of analysis for `sch' )"  _dup(60) "-" 
    
        regress y1gpa   hsgpa   if school_name == `s'
        estimates store _mr
    
        regress y1gpa   hsgpa a b c   if school_name == `s'
        estimates store _mf
    
        lrtest _mf _mr
        //ftest _mf _mr
    
        test a b c 
    
    }