MATLAB strange behaviour with mod() when using vpa()

If we have the following code:

base = vpa(1); 
height = vpa(2.2);
mod(2*base + height + height, 2 * (base + height))

This produces output of 6.4. I would expect the result to be 0 and the numeric solution does give 0. But I need to use symbolic values with vpa().

I did some experimentation to find out why and found:

simplify(2*base + height + height < 6.4)
simplify(2 * (base + height) == 6.4)

both give TRUE. So the same (numeric) expression is smaller and equal to 6.4.

What should I do to fix this and get the answer of 0? What is causing this problem?

Solution

The problem is that vpa offers arbitrarily large precision, but is not exact. First off, note that vpa with a second input uses digits precision digits, which is 32 by default. When you do

height = vpa(2.2)

this is the same as

height = vpa(2.2, 32)

assuming that digits is 32. So height will have 32 digits of precision, but will not be exact. To see this, evaluate the defined height with more precision:

>> vpa(height, 32)
ans =
2.2
>> vpa(height, 40)
ans =
2.2
>> vpa(height, 50)
ans =
2.2000000000000000000000000000000000000001469367939

This inaccuracy introduces a numerical difference between 2*base + height + height and 2 * (base + height), neither of which actually equals 6.4:

>> base = vpa(1); 
>> height = vpa(2.2);
>> vpa(2*base + height + height, 50)
ans =
6.3999999999999999999999999999999999999988245056492
>> vpa(2 * (base + height), 50)
ans =
6.4000000000000000000000000000000000000002938735877

As a result, even if mod(2*base + height + height, 2 * (base + height)) seems to be 0

>> mod(2*base + height + height, 2 * (base + height))
ans =
6.4

but it's _not_:

>> vpa(mod(2*base + height + height, 2 * (base + height)), 50)
ans =
6.3999999999999999999999999999999999999988245056492

Note that the deviation from 6.4 in this latter result doesn't equal the sum of the two small deviations above; rather, it equals the first. Numerical inaccuracy is not guaranteed to be additive.

In short, vpa diminishes, but doesn't completely avoid, numerical precision errors.

What if we increase the number of digits used for vpa? Will that give more precision and maybe solve the issue?

>> base = vpa(1, 1000); 
>> height = vpa(2.2, 1000);

Strangely, inspite of having used more digits, we get the same inaccuracy as before:

>> vpa(2*base + height + height, 50)
ans =
6.3999999999999999999999999999999999999988245056492
>> vpa(2 * (base + height), 50)
ans =
6.4000000000000000000000000000000000000002938735877
>> vpa(mod(2*base + height + height, 2 * (base + height)), 50)
ans =
6.3999999999999999999999999999999999999988245056492

So, the only way to avoid precision errors is to replace vpa by symbolic variables, which are exact:

>> base = sym(1)
base =
1
>> height = sym(2.2)
height =
11/5

A digression on how to define symbolic variables is in order. Note that sym(2.2) has the potential problem that it first defines 2.2 as a double floating-point number, with its inherent inaccuracy, and then that is converted to sym. In this case that's not a problem, because the floating-point numerical representation of 2.2 happens to be exact. Indeed, we can check that Matlab displays height = sym(2.2), which is exact. Furthermore, even if the double representation is not exact, Matlab tries to guess your intent, and often suceeds:

>> sym(3.141592653589793)
ans =
pi

(Matlab assumes we meant the number pi)... but not always:

>> sym(sqrt(777))
ans =
777^(1/2)
>> sym(sqrt(777777))
ans =
7757421003204227/8796093022208

(sqrt(777) gives the double result 27.874719729532707, which is recognized by sym as an approximation of the square root of 777. On the other hand, sqrt(777777) gives 8.819166627295348e+02, which is not recognized by sym as an approximation of the square root of 777777).

>> sym(7777)
ans =
7777
>> sym(7777777777777777777)
ans =
7777777777777777664

(7777 is represented exactly as a double, but 7777777777777777777 is not, because it exceeds 2^53.)

So, to be sure, the safe way to define symbolic variables is to only use small integers, other symbolic variables, or exact string representations:

>> height = sym('22/10')
height =
11/5

Now, with base and height correctly defined as symbolic variables, we get the expected result:

>> mod(2*base + height + height, 2 * (base + height))
ans =
0
>> vpa(mod(2*base + height + height, 2 * (base + height)), 1000)
ans =
0.0