Search code examples
loopsforeachstatalocal

Understanding the use of local command in Randomization Check


I am trying to run a do-file for an RCT. The authors are using the local command.

They create a local macro taking the value 1. Based on that, they are creating missing values for the var variable. However, after that, I cannot understand why they are adding 1 to it. Moreover, why are the numvar and numvarse being used for mean and sd?

Any explanation would be really helpful.

The code:

*RANDOMIZATION CHECK

local numvar=1

foreach var in T1_Client_Age T1_Client_Married T1_HH_Size T1_Client_Literate T1_Client_Education T1_muslim ///
                    T1_Hindu_SC_Kat T1_rest3    T1_Log_HH_Income T1_Household_Business     ///
                    T1_Client_Wage_Salaried_2 T1_Client_Bus_PR_Employed_2 T1_Client_Housewife_2 {
                    
        replace `var'=. if miss_`var'==1
        
        local numvarse = `numvar' + 1
        
        replace varname="`var'" in `numvar' 

        *MEAN AND SD FOR CONTROL
        quietly: sum `var' if  Treated_All==0
        replace Control=r(mean) in `numvar' 
        replace Control=r(sd) in `numvarse'  
        
        *MEAN AND SD FOR TREATED
        quietly:sum `var' if  Treated_All==1 
        replace Treat=r(mean) in `numvar' 
        replace Treat=r(sd) in `numvarse'  
        
        *MEAN AND SD FOR TREATED WITH FRIEND
        quietly:sum `var' if  Treatment_Peer==1
        replace Treat_Peer=r(mean) in `numvar' 
        replace Treat_Peer=r(sd) in `numvarse'  
        
        *MEAN DIFFERENCES BETWEEN TREATED AND CONTROL
         xi:reg `var' Treated_All i.sewa_center*i.baseline i.t_month  , cluster(t_group) 
        replace Diff_Control_Treat=_b[Treated_All] in `numvar' 
        replace Diff_Control_Treat=_se[Treated_All] in `numvarse'  

        *MEAN DIFFERENCES BETWEEN TREATED ALONE AND WITH FRIEND
        xi:reg `var' Treated_All Treatment_Peer i.sewa_center*i.baseline i.t_month   , cluster(t_group) 
        replace Diff_Alone_Peer=_b[Treatment_Peer] in `numvar' 
        replace Diff_Alone_Peer=_se[Treatment_Peer] in `numvarse'  

        
    local numvar = `numvarse' + 1   

        }

Solution

  • The principle can be understood by looking at just a few commands.

      local numvar=1
    
      foreach var in ...  {
                    
        
        replace `var'=. if miss_`var'==1
        
        local numvarse = `numvar' + 1
        
        replace varname="`var'" in `numvar' 
    
        *MEAN AND SD FOR CONTROL
        quietly: sum `var' if  Treated_All==0
        replace Control=r(mean) in `numvar' 
        replace Control=r(sd) in `numvarse'  
        
        *MEAN AND SD FOR TREATED
        quietly:sum `var' if  Treated_All==1 
        replace Treat=r(mean) in `numvar' 
        replace Treat=r(sd) in `numvarse'  
        
        *MEAN AND SD FOR TREATED WITH FRIEND
        quietly:sum `var' if  Treatment_Peer==1
        replace Treat_Peer=r(mean) in `numvar' 
        replace Treat_Peer=r(sd) in `numvarse'  
        
        local numvar = `numvarse' + 1   
    
    }
    

    numvar is initialised to 1. Within the loop, numvarse is set to 1 more than numvar.

    The code then puts results for each of a series of variables in observations numvar and numvarse, including the name of the variable AND its mean and SD for various groups.

    The key is to focus on

      replace ... in ...
    

    where the in qualifier specifies replace only in that observation.

    Towards the end of the loop numvar is bumped yet again.

    In short, results for the first variable named go in observations 1 and 2; for the second variable named go in observations 3 and 4; and so on.

    Minimally, this process depends on

    1. There being enough observations for this to be possible. I count 13 variables, so there must be at least 26 observations in the dataset, which seems overwhelmingly likely.

    2. There being an understanding that the variables being written to are not aligned with any other variables.

    Some might find #2 and even #1 poor ways of working. They would want results to be written to a new dataset or to a new frame.

    The code is not state of the art: using xi with xtreg has been outdated practice since Stata 11 (2009), but still works for many purposes.

    (Detail: The code overwrites values in variable X with missing if separately there is a variable miss_X indicating that variable X is missing. I don't think any remote observers can explain that.)