Search code examples
countuniquestata

Count unique values in Stata


codebook is a great command in Stata. It describes data contents but also simply identifies unique values

sysuse auto, clear
codebook mpg, compact

Number of unique values of mpg is 21. Looking at the help of the command, it does not seem possible to store this value. Am I wrong?

I am aware of other ways to compute the number of unique values in Stata, but it would be so convenient to add this feature to the codebook command.


Solution

  • You can easily write a wrapper for codebook that uses Nick's distinct command from SSC to store the info you want as scalar(s).

    In my experience, this wrapper approach has proven to be much more effective than asking the nice folks at StataCorp to change their command on an internet forum that they do not participate in.

    Here's an example:

    * (1) You can stick this into a file called mycodebook.ado in
    * /ado/personal (use adopath to see exact location)
    capture program drop mycodebook
    program mycodebook, rclass
    syntax [varlist] [if] [in][, *]
    codebook `varlist' `if' `in', `options'
    capture ssc install distinct
    foreach var of varlist `varlist' {
        qui distinct `var' `if' `in'
        return scalar nv_`var' = r(ndistinct)
    }
    end
    
    * (2) example with mycodebook
    sysuse auto, clear
    mycodebook price mpg rep78 if foreign==0, compact
    return list
    

    This last part will give you:

    . mycodebook price mpg rep78 if foreign==0, compact
    
    Variable   Obs Unique      Mean   Min    Max  Label
    ----------------------------------------------------------------------------------
    price       52     52  6072.423  3291  15906  Price
    mpg         52     17  19.82692    12     34  Mileage (mpg)
    rep78       48      5  3.020833     1      5  Repair Record 1978
    ----------------------------------------------------------------------------------
    
    . return list
    
    scalars:
               r(nv_rep78) =  5
                 r(nv_mpg) =  17
               r(nv_price) =  52
    

    You can then do things like (or whatever it is you want to do with these):

    gen x=r(nv_rep78)