Search code examples
statisticssequenceanalyticsstata

Find missing instances of a sequence


How can I find in Stata the missing instances of a sequence?

input seq
1
2
4
5
6
7
9
10
end

E.g. 3 and 8 are missing in the sequence 1 to 10. How can they be found?

My attempt

list seq if !inrange(seq, 1,10)

However, this does not work.


Solution

  • Stata uses missing to mean values present in the data with a missing value code.

    Here the problem is to identify values that might have been (should have been?) in the dataset, but are, to use a different word, absent.

    Here are two approaches to your problem:

    clear 
    input seq
    1
    2
    4
    5
    6
    7
    9
    10
    end
    
    numlist "1/10" 
    local expected `r(numlist)'
    levelsof seq, local(observed)
    local absent : list expected - observed  
    
    di "`absent'"
    
    forval j = 1/10 {
        quietly count if seq == `j'
        if r(N) == 0 local ABSENT `ABSENT' `j'
    }
    
    di "`ABSENT'"