Search code examples
pythonpandasscipybeta-distribution

Python - Apply SciPy Beta Distribution to all rows of Pandas DataFrame


In SciPy one can implement a beta distribution as follows:

x=640495496
alpha=1.5017096
beta=628.110247
A=0
B=148000000000 
p = scipy.stats.beta.cdf(x, alpha, beta, loc=A, scale=B-A)

Now, suppose I have a Pandas dataframe with the columns x,alpha,beta,A,B. How do I apply the beta distribution to each row, appending the result as a new column?


Solution

  • Given that I suspect that pandas apply is just looping over all rows, and the scipy.stats distributions have quite a bit of overhead in each call, I would use a vectorized version:

    >>> from scipy import stats
    >>> df['p'] = stats.beta.cdf(df['x'], df['alpha'], df['beta'], loc=df['A'], scale=df['B']-df['A'])
    >>> df
       A             B     alpha        beta          x         p
    0  0  148000000000  1.501710  628.110247  640495496  0.858060
    1  0  148000000000  1.501704  620.110000  640495440  0.853758