Search code examples
typessumstata

Two variables are not recognized as identical by assert


I would like to know more about the behavior of the following code:

clear
set obs 10000
set seed 98034

* I generate three variables
generate double u1 = runiform()
generate double u2=u1

*check
assert u2==u1
***

generate double var1=runiform()

* I generate some ids
generate byte id_=0

forvalues i=1(1)`=10000/100'{
    replace id_=`i' if _n>`=(`i'-1)*`=10000/100''
}

*I sum by id_ u1 and u2
bysort id_: egen double u11= total(u1)
bysort id_: egen double u21= total(u2)

*check
assert u11==u21
***

*I drop duplicates
bysort id_: drop if _n>1

*I generate a new variable which should be equal to var1 (I am adding and 
*subtracting the same quantities)

generate double var2= var1 - u11 + u21

*(1) 
assert var2==var1

In particular, I cannot understand why assert (1) fails, I have generated every variable I have summed in the same way so var1 and var2 should be identical.

Interestingly, if I order differently the sum, assert works:

drop var2
generate double var2= - u11 + u21 + var1 

*(2)
assert var2==var1

Solution

  • The two variables are not identical. To see this, change their format:

    format var1 %20.15f
    format var2 %20.15f
    
    list var1 var2 in 1/10
    
         +---------------------------------------+
         |              var1                var2 |
         |---------------------------------------|
      1. | 0.498376312204773   0.498376312204776 |
      2. | 0.394671386281136   0.394671386281132 |
      3. | 0.515152901323075   0.515152901323077 |
      4. | 0.789668809822002   0.789668809822004 |
      5. | 0.931897887273974   0.931897887273976 |
         |---------------------------------------|
      6. | 0.947614996238336   0.947614996238336 |
      7. | 0.207296218919878   0.207296218919879 |
      8. | 0.368812285027951   0.368812285027950 |
      9. | 0.565084085641873   0.565084085641871 |
     10. | 0.331114583239097   0.331114583239099 |
         +---------------------------------------+
    

    The ordering of mathematical operations does matter and takes place from left to right:

    generate double var2= var1 - u11 + u21
    format var2 %20.15f
    
    generate double v2a = - u11 + u21
    generate double v2b = var1 + v2a
    format v2b %20.15f
    
    generate double v2c = var1 - u11
    generate double v2d = v2c + u21
    format v2d %20.15f
    
    list var2 v2b v2d in 1/10
    
         +-----------------------------------------------------------+
         |              var2                 v2b                 v2d |
         |-----------------------------------------------------------|
      1. | 0.498376312204776   0.498376312204773   0.498376312204776 |
      2. | 0.394671386281132   0.394671386281136   0.394671386281132 |
      3. | 0.515152901323077   0.515152901323075   0.515152901323077 |
      4. | 0.789668809822004   0.789668809822002   0.789668809822004 |
      5. | 0.931897887273976   0.931897887273974   0.931897887273976 |
         |-----------------------------------------------------------|
      6. | 0.947614996238336   0.947614996238336   0.947614996238336 |
      7. | 0.207296218919879   0.207296218919878   0.207296218919879 |
      8. | 0.368812285027950   0.368812285027951   0.368812285027950 |
      9. | 0.565084085641871   0.565084085641873   0.565084085641871 |
     10. | 0.331114583239099   0.331114583239097   0.331114583239099 |
         +-----------------------------------------------------------+
    

    In this case there is also likely a precision issue involved because of the magnitude of the differences. For further details type help precision in Stata's prompt.