I am trying to find a way to get the number of levels of a categorical variable as a single number. For example if I have a variable X
with 4 levels I need to somehow get that number. If I type levelsof X
I get the following 1 2 3 4
but I can't get only number 4 from there. Is there a way to do it using the levelsof
or another command?
Various commands will give you the number of distinct values, for any kind of variable. ("Categorical variable" is a statistical concept, rather than a Stata concept.) Perhaps the simplest way to do it for one-off purposes is to ask for a one-way tabulation using tabulate
. The number of distinct values is then the number of rows in that table, returned as r(r)
. Note that (1) you can suppress the table itself (which is useful in a program or do file) and (2) missing values are excluded by default:
. sysuse auto, clear
(1978 Automobile Data)
. qui tab foreign
. ret li
scalars:
r(N) = 74
r(r) = 2
. qui tab rep78
. ret li
scalars:
r(N) = 69
r(r) = 5
. qui tab rep78, missing
. ret li
scalars:
r(N) = 74
r(r) = 6
An extended review of this problem, pitched more generally, is available here. That paper introduces a distinct
command. Its uses include direct support for looking at the number of distinct values systematically. search distinct
in Stata to find a download source for the most recent version.
. distinct
| Observations
| total distinct
--------------+----------------------------
make | 74 74
price | 74 74
mpg | 74 21
rep78 | 69 5
headroom | 74 8
trunk | 74 18
weight | 74 64
length | 74 47
turn | 74 18
displacement | 74 31
gear_ratio | 74 36
foreign | 74 2