Search code examples
statastata-macros

Return a matrix from distinct command


I have a simple question about the distinct command in Stata.

When using with a by prefix, can it return a one dimension matrix of r(N)?

For example:

sysuse auto,clear
bysort foreign: distinct rep78

Can I store a [2,1] matrix, with each row representing the number of distinct values of rep78?

The manual seems to suggest that it only stores the number of distinct values of the last by value.


Solution

  • You can easily create your own wrapper for that:

    sysuse auto,clear
    
    sort foreign                
    levelsof foreign, local(foreign_levels)
    local number_of_foreign_levels : word count `foreign_levels'
    
    matrix distinct_mat = J(`number_of_foreign_levels', 1, 0)
    
    forvalues i = 1 / `number_of_foreign_levels' {
         quietly distinct rep78 if foreign == `i' - 1
         matrix distinct_mat[`i', 1] = r(ndistinct)
    }
    
    matrix list distinct_mat
    
    distinct_mat[2,1]
        c1
    r1   5
    r2   3
    

    Note that the number of distinct observations is stored in r(ndistinct), not r(N).