I am trying to convert a string variable (type str2
, format %9s
) into an indicator variable in Stata.
However, I keep receiving the following error:
type mismatch r(109)
I am using the 2016 ANES set and I am essentially trying to group states into open primary and closed primary/caucus states.
I have attempted the following code:
gen oprim= (state=="AL" & "AK" & "CO" & "GA" &...)
gen oprim=1 if state=="AL" & "AK" & "CO" & "GA" &...
I have had trouble converting this variable before. for example, I tried generating the new indicator variable without putting quotations around the state codes.
I have also tried to destring
the variable, but I am receiving the following output:
destring state, generate(statenum) float
state: contains nonnumeric characters; no **generate**
Any help anyone could offer would be much appreciated.
Let's spell out why the code in the question is wrong. The OP doesn't give example data but the errors are all identifiable without such data, assuming naturally that state
is a string variable in the dataset.
First, we can leave out the ...
(which no one presumes are legal) and the parentheses (which make no difference).
gen oprim = state=="AL" & "AK" & "CO" & "GA"
gen oprim=1 if state=="AL" & "AK" & "CO" & "GA"
Either of these will fail because Stata parses the if
condition as
if
state == "AL"
& "AK"
& "CO"
& "GA"
state == "AL"
is a true-or-false condition evaluated as 0 or 1, but none of "AK"
"CO"
"GA"
is a true or false condition; they are all string values and so the commands fail, because Stata needs to see something numeric as each of the elements in a if
condition. Although clearly silly,
gen oprim = state == "AL" & 42
would be legal as 42 is numeric (and in true-or-false evaluations counts as true). Stata won't fill in state ==
, which is what you hope to see implied.
If you rewrite
gen oprim = state == "AL" & state == "AK" & state == "CO" & state == "GA"
then you have a legal command. It's just not at all what you evidently want. It's impossible for state
to be equal to different string values in the same observation, which is what this command is testing for. You're confusing &
(and) with |
(or).
gen oprim = state == "AL" | state == "AK" | state == "CO" | state == "GA"
Such statements get long and are tedious and error-prone to write out, but Stata has alternative syntax
gen oprim = inlist(state, "AL", "AK", "CO", "GA")
There are limits to that -- and yet other strategies too -- but I will leave this answer there without addressing further issues.