I'm trying to QAQC the results of calculations that are done in a PostgreSQL database, using a python script to read in the inputs to the calculation and echo the calculation steps and compare the final results of the python script against the results from the PostgreSQL calculation.
The calculations in the PostgreSQL database use the percent_rank function, returning the percentile rank (from 0 to 1) of a single value in a list of values. In the python script I am using the Scipy percentileofscore function.
So, here's the question: I can't get the results to match, and I am wondering if anyone knows what settings I should use in the Scipy percentileofscore function to match the PostgreSQL percent_rank function.
You can use scipy.stats.rankdata
. The following example reproduces the result shown at http://docs.aws.amazon.com/redshift/latest/dg/r_WF_PERCENT_RANK.html:
In [12]: import numpy as np
In [13]: from scipy.stats import rankdata
In [14]: values = np.array([15, 20, 20, 20, 30, 30, 40])
rankdata(values, method='min')
gives the desired rank:
In [15]: rank = rankdata(values, method='min')
In [16]: rank
Out[16]: array([1, 2, 2, 2, 5, 5, 7])
Then a basic calculation gives the equivalent of percent_rank
:
In [17]: (rank - 1) / (len(values) - 1)
Out[17]:
array([ 0. , 0.16666667, 0.16666667, 0.16666667, 0.66666667,
0.66666667, 1. ])
(I'm using Python 3.5. In Python 2, use something like (rank - 1) / float(len(values) - 1)
.)
You can use percentileofscore
, but:
kind='strict'
.n/(n-1)
, where n
is the number of values.percentileofscore
expects its second argument to be a scalar, so you have to use a loop to compute the result separately for each value.Here's an example using the same values as above:
In [87]: import numpy as np
In [88]: from scipy.stats import percentileofscore
In [89]: values = np.array([15, 20, 20, 20, 30, 30, 40])
In [90]: n = len(values)
Here I use a list comprehension to generate the result:
In [91]: [n*percentileofscore(values, val, kind='strict')/100/(n-1) for val in values]
Out[91]:
[0.0,
0.16666666666666666,
0.16666666666666666,
0.16666666666666666,
0.66666666666666663,
0.66666666666666663,
1.0]