Python Riemann Sum does not yield zero for equal positive and negative areas

I wrote a program to approximate an integral using a Riemann sum and graph it using matplotlib in Python. For functions with equal areas above and below the x-axis, the resulting area should be zero, but my program outputs a very small number instead.

The following code graphs the odd function f(x) = x^3 from -1 to 1, so the area should be zero. My code instead approximates it to be 1.68065561477562 e^-15.

What is causing this? Is it a rounding error in delta_x, x, or y? I know I could just round the value to zero, but I'm wondering if there is another problem or way to solve this.

I have tried using the Decimal.decimal class for delta_x, but I just got an even smaller number.

The Python code:

import matplotlib.pyplot as plt
import numpy as np

# Approximates and graphs integral using Riemann Sum


# example function: f(x) = x^3
def f_x(x):
    return x**3

# integration range from a to b with n rectangles
a, b, n = -1, 1, 1000

# calculate delta x, list of x-values, list of y-values, and approximate area under curve
delta_x = (b - a) / n

x = np.arange(a, b+delta_x, delta_x)

y = [f_x(i) for i in x]

area = sum(y) * delta_x

# graph using matplotlib
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.bar(x, y, delta_x, alpha=.5)
plt.title('a={}, b={}, n={}'.format(a, b, n))
plt.xlabel('A = {}'.format(area))
plt.show()

Solution

You need to be aware that what you are computing is not a Riemann integral in the original sense. You are dividing the interval into n bins, but then sum over n+1 bins (here n = 1000 but len(x) == 1001). So the result may be close to what you expect, but it is certainly not a good way to get there.

Using the Riemann sum you would divide your interval into n bins, and then sum over the values of those n bins. You have the choice whether to compute the left Riemann sum, the right Riemann sum, or possibly taking the midpoints.

import numpy as np

def f_x(x):
    return x**3

# integration range from a to b with n rectangles
a, b, n = -1, 1, 1000

delta_x = (b - a) / float(n)

x_left = np.arange(a, b, delta_x)
x_right = np.arange(a+delta_x, b+delta_x, delta_x)
x_mid = np.arange(a+delta_x/2., b+delta_x/2., delta_x)

print len(x_left),  len(x_right), len(x_mid)          ### 1000 1000 1000


area_left = f_x(x_left).sum() * delta_x
area_right = f_x(x_right).sum() * delta_x
area_mid = f_x(x_mid).sum() * delta_x

print area_left     # -0.002
print area_right    #  0.002
print area_mid      # 1.81898940355e-15

While the midpoint sum already gives a good result, for symmetric functions it would be a good idea to choose n even, and take the average of the left and right sum,

print 0.5*(area_right+area_left)   # 1.76204537072e-15

This is equally close to 0.

Now it is worthwhile noting that numpy.arange produces some errors by itself. A better choice would be using numpy.linspace

x_left = np.linspace(a, b-delta_x, n)
x_right = np.linspace(a+delta_x, b, n)
x_mid = np.linspace(a+delta_x/2., b-delta_x/2., n)

yielding

print area_left     # -0.002
print area_right    #  0.002
print area_mid      # 8.52651282912e-17
print 0.5*(area_right+area_left)   # 5.68121938382e-17

5.68121938382e-17 is pretty close to 0. The reason why it is not entirely 0 is indeed floating point inaccuracies.

The famous example of that would be 0.1 + 0.2 - 0.3 which results in 5.5e-17 instead of 0. This is to show that this simply operation introduces the same error of the order of 1e-17 as the Riemann integration.