Search code examples
rstata

Meaning of e(F)


I am currently trying to translate a Stata script into an R script. I have not been able to find documentation to explain exactly what e(F) is.

The script randomly selects 5,000 customers to act as control, and 12,000 customers to act as the treatment. Then, it performs some kind of statistical test to determine whether the random selections will create a statistically rigorous sample. Googling can only get me so far though, because I do not really understand Stata syntax.

I would really appreciate any help. Here is the script in question...

import delimited $data\data.csv

gen random=0
gen rank=0
scalar treattail=0
scalar controltail=0
gen control=0
gen treat=0

local treatment treat control

while treattail <0.25 {
    foreach y in `treatment' {
    qui replace `y'=0
    }
    qui replace random=100*runiform()
    sort random
    replace rank=_n
    replace control=1 if rank>=0 & rank<=5000
    replace treat=1 if rank>5000 & rank<=17000

foreach y in `treatment' { 
reg `y' meanconsumption varconsumption
scalar `y'tail = Ftail(`e(df_m)',`e(N)'-`e(df_m)',`e(F)') # I don't quite understand this line
}
scalar dir
}

Solution

  • local treatment treat control
    ....
    foreach y in `treatment' { 
    

    will run the code between the brace and the matching closing brace once with the local macro y set to treat and the second time set to control.

    reg `y' meanconsumption varconsumption
    

    The first time through, this will regress the variable treat on the two variables meanconsumption and varconsumption. The second time through, the variable control will be regressed on the same two variables.

    scalar `y'tail = Ftail(`e(df_m)',`e(N)'-`e(df_m)',`e(F)')
    

    This will calculate the tail of a F distribution with parameters given by the first two arguments, and the F statistic given by the third argument, where the estimation results e() are from the regress command just run, and are defined as follows.

      e(N)                number of observations
      e(df_m)             model degrees of freedom
      e(F)                F statistic 
    

    The first time through, the calculated value will be stored in a Stata scalar treattail and the second time through in a Stata scalar controltail. In other words, it seems to be calculating and saving p values for the F statistics from the two regressions.