When using the trapz function of numpy: defining a new (different) y value at the same x value, the result are not what I initially expected.
>>> import numpy as np
>>> np.trapz([1,1,1],[0,1,2]) #normal area defined by 3 y values, y=1 at x=0,x=1 and x=2
2.0
>>> np.trapz([1,1,1,1,1],[0,1,2,1,2]) #redefine point x=1 and x=2 with the same y value that was already used
2.0
>>> np.trapz([1,1,1,2,2],[0,1,2,1,2]) #redefine point x=1 and x=2 but with y=2
2.5
I would've expected the point to be overwritten completely (the old value is discarded) or that the highest points would have been taken (discarding all the overlapping area). As can be seen, this is not the case.
Is there a mathematical reason behind this behaviour? or is this just a result of the way the function is programmed (and should it simply not be used like this)?
My mathematical or programming knowledge is unfortunately not at the level that I can answer this question myself based on the source code.
Some more examples:
>>> np.trapz([1,1,1,2,2],[0,1,2,2,1]) #redefine point x=2 first and THEN x=1 with y=2
0.0
>>> np.trapz([1,2,2,1,1],[0,1,2,1,2]) #start with y=2 and then add y=1 on x=1 and x=2
3.0
>>> np.trapz([1,1,1],[0,1,2]) #normal area again
2.0
>>> np.trapz([1,1,1],[2,1,0]) #but now defined in reverse order
-2.0
(The context: I have data from a system in which the time is not always correctly set, sometimes the timestamps start counting from zero again, I wanted to know what would happen when I don't fix this data but just input it in the trapz function)
There is no "redefinition" happening when you repeat an x-value in trapz. The trapz method does the following:
x=[0,1,2,1,2]
yields [1, 1, -1, 1]
.y=[1,2,2,1,1]
yields [1.5, 2, 1.5, 1]
1*1.5 + 1*2 + (-1)*1.5 + 1*1 = 3
The algorithm pays no attention to whether some x values appear twice; none of y values replace others.
It is sometimes useful to enter the same x-value twice in a row with different y-values: namely, when your function has a discontinuity.
It may also be useful to enter x-values out of order. Each time you backtrack, the corresponding y-values contribute negatively to the integral. For example, if (x,y) pairs are the vertices of the polygon drawn below, given in clockwise order, the output of trapz is the shaded area. The segments traversed from right to left cut away the area under them, instead of adding.
For another example, if x is position and y is the force exerted by an object when in that position, the integral is the total work done; in this context it makes sense to have x out of order if the object moves left and right.
But if your data points just happened to be out of order for some random reason that has nothing to do with the meaning of your data, then the output of trapz is meaningless.