I have data extracted from a pdf graph where x represents incubation times and y is the density in a csv file. I would like to calculate the percentiles, such as 95%. I'm a bit confused, should I calculate the percentile using the x values only, i.e., using np.precentile(x, 0.95)
?
Here is some code which uses np.trapz (as proposed by @pjs). We take x and y arrays, assume it is PDF so first we normalize it to 1, an then start searching backward till we hit 0.95 point. I've made up some multi-peak function
import numpy as np
import matplotlib.pyplot as plt
N = 1000
x = np.linspace(0.0, 6.0*np.pi, N)
y = np.sin(x/2.0)/x # construct some multi-peak function
y[0] = y[1]
y = np.abs(y)
plt.plot(x, y, 'r.')
plt.show()
# normalization
norm = np.trapz(y, x)
print(norm)
y = y/norm
print(np.trapz(y, x)) # after normalization
# now compute integral cutting right limit down by one
# with each iteration, stop as soon as we hit 0.95
for k in range(0, N):
if k == 0:
xx = x
yy = y
else:
xx = x[0:-k]
yy = y[0:-k]
v = np.trapz(yy, xx)
print(f"Integral {k} from {xx[0]} to {xx[-1]} is equal to {v}")
if v <= 0.95:
break