I wish to run a series of multinomial logits (600ish per covariate of interest) and gather the z-statistics from each of these (I do not care about the order in which these are recorded).
These mlogits are run on a small piece of my data (sharing a group ID). The mlogits have a varying number of outcomes involved (n), and there will be (n - 1) z statistics to gather from each mlogit. Each mlogit takes the form: y = a + _b*x + \epsilon where y can take on between 2 and 9 values (in my data), although the mean is 3.7.
I believe the difficulty comes in pulling these z-stats out of the mlogit, as there is no way I know to directly call a matrix of z-stats. My solution is to construct the z-stats from the e(V) and e(b) matrices. For each iteration of the mlogit, I construct a matrix of z-stats; I then append this to the previous matrix of z-stats (thereby building a matrix of all of them calculated). Unfortunately, my code does not seem to do this properly.
The symptoms are as follows. The matrix mat_covariate includes many missing values (over half of the matrix values have been missing in the troubleshooting I have done). It also includes many zeroes (which are possible, but unlikely - especially at this rate, about 16%). As written, the code does not yet suppress the mlogits I run, and so I can go back and check what makes it into the matrix. At most one value from each mlogit is recorded, but these are often recorded multiple times. 40% of the mlogits had nothing recorded.
The relevant loop is below:
local counter = 1
forvalues i = 1/`times' {
preserve
keep if group_id==`i'
foreach covariate in `covariates' {
if `counter' == 1 {
mlogit class `covariate'
sum outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate' = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'[`j',1] = `z'
}
}
else {
mlogit class `covariate'
sum outcomes_n, meanonly
local max `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate'_temp = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'_temp[`j',1] = `z'
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
}
local counter = `counter'+1
restore
}
Some reasons for why I did some of the things in the loop. I believe these things work, but they are not my first instincts, and I am unclear why my first instinct does not work. If there's a simpler/more elegant way to solve them, that would be a nice bonus:
I created some sample data that can be used to replicate this problem. Below is code for a .do file which sets up data, locals for the loop, the loop above, and demonstrates the symptoms by displaying the matrix in question:
clear all
version 14
//================== sample data: ==================
set obs 500
set seed 12345
gen id = _n
gen group_id = .
replace group_id = 1 if id <= 50
replace group_id = 2 if id <= 100 & missing(group_id)
replace group_id = 3 if id <= 150 & missing(group_id)
replace group_id = 4 if id <= 200 & missing(group_id)
replace group_id = 5 if id <= 250 & missing(group_id)
replace group_id = 6 if id <= 325 & missing(group_id)
replace group_id = 7 if id <= 400 & missing(group_id)
replace group_id = 8 if id <= 500 & missing(group_id)
gen temp_subgroup_id = .
replace temp_subgroup_id = floor((3)*runiform() + 2) if group_id < 6
replace temp_subgroup_id = floor((4)*runiform() + 2) if group_id < 8 & missing(temp_subgroup_id)
replace temp_subgroup_id = floor((5)*runiform() + 2) if missing(temp_subgroup_id)
egen subgroup_id = group(group_id temp_subgroup_id)
bysort subgroup_id : gen subgroup_size = _N
bysort group_id subgroup_id : gen tag = (_n == 1)
bysort group_id : egen outcomes_n = total(tag)
gen binary_x = floor(2*runiform())
//================== locals: ==================
local covariates binary_x
local times = 8
// times is equal to the number of group_ids
//================== loop in question: ==================
local counter = 1
forvalues i = 1/`times' {
preserve
keep if group_id==`i'
foreach covariate in `covariates' {
if `counter' == 1 {
mlogit subgroup_id `covariate'
sum outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate' = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'[`j',1] = `z'
}
}
else {
mlogit subgroup_id `covariate'
sum outcomes_n, meanonly
local max `r(max)'
local max_minus = `max' - 1
matrix mat_`covariate'_temp = J(`max_minus',1,0)
forvalues j = 1/`max_minus' {
mat V = e(V)
mat b = e(b)
local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
matrix mat_`covariate'_temp[`j',1] = `z'
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
}
}
local counter = `counter' + 1
restore
}
//================== symptoms: ==================
matrix list mat_binary_x
I'm trying to figure out what is wrong in my code, but have been unable to find the issue (although I've found some other smaller errors, but none that have had an impact on the main problem - I would be unsurprised if there are multiple bugs).
Consider the simplest case when i == 1
and max_minus == 2
:
preserve
keep if group_id == 1
summarize outcomes_n, meanonly
local max = `r(max)'
local max_minus = `max' - 1
mlogit subgroup_id binary_x
matrix V = e(V)
matrix b = e(b)
This produces the following:
. matrix list V
symmetric V[6,6]
1: 1: 2: 2: 3: 3:
o. o.
binary_x _cons binary_x _cons binary_x _cons
1:binary_x .46111111
1:_cons -.225 .225
2:o.binary_x 0 0 0
2:o._cons 0 0 0 0
3:binary_x .2111111 -.09999999 0 0 .47896825
3:_cons -.09999999 .09999999 0 0 -.24285714 .24285714
. matrix list b
b[1,6]
1: 1: 2: 2: 3: 3:
o. o.
binary_x _cons binary_x _cons binary_x _cons
y1 .10536052 -.22314364 0 0 .23889194 -.35667502
. local j = `max_minus'
. display "z = `= b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)'"
z = .
The value of z
is missing because you are dividing the value of a row in the
matrix e(b)
that does not exist. In other words, your loops are
not set up correctly and substitute incorrect values.