I would like to know more about the behavior of the following code:
clear
set obs 10000
set seed 98034
* I generate three variables
generate double u1 = runiform()
generate double u2=u1
*check
assert u2==u1
***
generate double var1=runiform()
* I generate some ids
generate byte id_=0
forvalues i=1(1)`=10000/100'{
replace id_=`i' if _n>`=(`i'-1)*`=10000/100''
}
*I sum by id_ u1 and u2
bysort id_: egen double u11= total(u1)
bysort id_: egen double u21= total(u2)
*check
assert u11==u21
***
*I drop duplicates
bysort id_: drop if _n>1
*I generate a new variable which should be equal to var1 (I am adding and
*subtracting the same quantities)
generate double var2= var1 - u11 + u21
*(1)
assert var2==var1
In particular, I cannot understand why assert
(1) fails, I have generated every variable I have summed in the same way so var1
and var2
should be identical.
Interestingly, if I order differently the sum, assert
works:
drop var2
generate double var2= - u11 + u21 + var1
*(2)
assert var2==var1
The two variables are not identical. To see this, change their format
:
format var1 %20.15f
format var2 %20.15f
list var1 var2 in 1/10
+---------------------------------------+
| var1 var2 |
|---------------------------------------|
1. | 0.498376312204773 0.498376312204776 |
2. | 0.394671386281136 0.394671386281132 |
3. | 0.515152901323075 0.515152901323077 |
4. | 0.789668809822002 0.789668809822004 |
5. | 0.931897887273974 0.931897887273976 |
|---------------------------------------|
6. | 0.947614996238336 0.947614996238336 |
7. | 0.207296218919878 0.207296218919879 |
8. | 0.368812285027951 0.368812285027950 |
9. | 0.565084085641873 0.565084085641871 |
10. | 0.331114583239097 0.331114583239099 |
+---------------------------------------+
The ordering of mathematical operations does matter and takes place from left to right:
generate double var2= var1 - u11 + u21
format var2 %20.15f
generate double v2a = - u11 + u21
generate double v2b = var1 + v2a
format v2b %20.15f
generate double v2c = var1 - u11
generate double v2d = v2c + u21
format v2d %20.15f
list var2 v2b v2d in 1/10
+-----------------------------------------------------------+
| var2 v2b v2d |
|-----------------------------------------------------------|
1. | 0.498376312204776 0.498376312204773 0.498376312204776 |
2. | 0.394671386281132 0.394671386281136 0.394671386281132 |
3. | 0.515152901323077 0.515152901323075 0.515152901323077 |
4. | 0.789668809822004 0.789668809822002 0.789668809822004 |
5. | 0.931897887273976 0.931897887273974 0.931897887273976 |
|-----------------------------------------------------------|
6. | 0.947614996238336 0.947614996238336 0.947614996238336 |
7. | 0.207296218919879 0.207296218919878 0.207296218919879 |
8. | 0.368812285027950 0.368812285027951 0.368812285027950 |
9. | 0.565084085641871 0.565084085641873 0.565084085641871 |
10. | 0.331114583239099 0.331114583239097 0.331114583239099 |
+-----------------------------------------------------------+
In this case there is also likely a precision issue involved because of the magnitude of the differences. For further details type help precision
in Stata's prompt.