I am trying to run a do-file for an RCT. The authors are using the local
command.
They create a local macro taking the value 1. Based on that, they are creating missing values for the var variable. However, after that, I cannot understand why they are adding 1 to it. Moreover, why are the numvar
and numvarse
being used for mean and sd?
Any explanation would be really helpful.
The code:
*RANDOMIZATION CHECK
local numvar=1
foreach var in T1_Client_Age T1_Client_Married T1_HH_Size T1_Client_Literate T1_Client_Education T1_muslim ///
T1_Hindu_SC_Kat T1_rest3 T1_Log_HH_Income T1_Household_Business ///
T1_Client_Wage_Salaried_2 T1_Client_Bus_PR_Employed_2 T1_Client_Housewife_2 {
replace `var'=. if miss_`var'==1
local numvarse = `numvar' + 1
replace varname="`var'" in `numvar'
*MEAN AND SD FOR CONTROL
quietly: sum `var' if Treated_All==0
replace Control=r(mean) in `numvar'
replace Control=r(sd) in `numvarse'
*MEAN AND SD FOR TREATED
quietly:sum `var' if Treated_All==1
replace Treat=r(mean) in `numvar'
replace Treat=r(sd) in `numvarse'
*MEAN AND SD FOR TREATED WITH FRIEND
quietly:sum `var' if Treatment_Peer==1
replace Treat_Peer=r(mean) in `numvar'
replace Treat_Peer=r(sd) in `numvarse'
*MEAN DIFFERENCES BETWEEN TREATED AND CONTROL
xi:reg `var' Treated_All i.sewa_center*i.baseline i.t_month , cluster(t_group)
replace Diff_Control_Treat=_b[Treated_All] in `numvar'
replace Diff_Control_Treat=_se[Treated_All] in `numvarse'
*MEAN DIFFERENCES BETWEEN TREATED ALONE AND WITH FRIEND
xi:reg `var' Treated_All Treatment_Peer i.sewa_center*i.baseline i.t_month , cluster(t_group)
replace Diff_Alone_Peer=_b[Treatment_Peer] in `numvar'
replace Diff_Alone_Peer=_se[Treatment_Peer] in `numvarse'
local numvar = `numvarse' + 1
}
The principle can be understood by looking at just a few commands.
local numvar=1
foreach var in ... {
replace `var'=. if miss_`var'==1
local numvarse = `numvar' + 1
replace varname="`var'" in `numvar'
*MEAN AND SD FOR CONTROL
quietly: sum `var' if Treated_All==0
replace Control=r(mean) in `numvar'
replace Control=r(sd) in `numvarse'
*MEAN AND SD FOR TREATED
quietly:sum `var' if Treated_All==1
replace Treat=r(mean) in `numvar'
replace Treat=r(sd) in `numvarse'
*MEAN AND SD FOR TREATED WITH FRIEND
quietly:sum `var' if Treatment_Peer==1
replace Treat_Peer=r(mean) in `numvar'
replace Treat_Peer=r(sd) in `numvarse'
local numvar = `numvarse' + 1
}
numvar
is initialised to 1. Within the loop, numvarse
is set to 1 more than numvar
.
The code then puts results for each of a series of variables in observations numvar
and numvarse
, including the name of the variable AND its mean and SD for various groups.
The key is to focus on
replace ... in ...
where the in
qualifier specifies replace
only in that observation.
Towards the end of the loop numvar
is bumped yet again.
In short, results for the first variable named go in observations 1 and 2; for the second variable named go in observations 3 and 4; and so on.
Minimally, this process depends on
There being enough observations for this to be possible. I count 13 variables, so there must be at least 26 observations in the dataset, which seems overwhelmingly likely.
There being an understanding that the variables being written to are not aligned with any other variables.
Some might find #2 and even #1 poor ways of working. They would want results to be written to a new dataset or to a new frame
.
The code is not state of the art: using xi
with xtreg
has been outdated practice since Stata 11 (2009), but still works for many purposes.
(Detail: The code overwrites values in variable X with missing if separately there is a variable miss_
X indicating that variable X is missing. I don't think any remote observers can explain that.)