I have a simple question about the distinct
command in Stata.
When using with a by
prefix, can it return a one dimension matrix of r(N)
?
For example:
sysuse auto,clear
bysort foreign: distinct rep78
Can I store a [2,1]
matrix, with each row representing the number of distinct values of rep78
?
The manual seems to suggest that it only stores the number of distinct values of the last by value.
You can easily create your own wrapper for that:
sysuse auto,clear
sort foreign
levelsof foreign, local(foreign_levels)
local number_of_foreign_levels : word count `foreign_levels'
matrix distinct_mat = J(`number_of_foreign_levels', 1, 0)
forvalues i = 1 / `number_of_foreign_levels' {
quietly distinct rep78 if foreign == `i' - 1
matrix distinct_mat[`i', 1] = r(ndistinct)
}
matrix list distinct_mat
distinct_mat[2,1]
c1
r1 5
r2 3
Note that the number of distinct observations is stored in r(ndistinct)
, not r(N)
.