Search code examples
pythonfrequencypercentage

Calculate frequency of a list of values in descending order and its associated percentage of values higher


I am trying to write a Python code that calculates for a given list of values (y) in descending order the frequency for each y value and the associated percentage of samples(yi) with larger y value taking into account the frequency.

Thanks very much! This is the Python code I'v written using NumPy but I get some errors while calculating the percentages and when calculating the frequency I want it to be in order in consistency with the new array of y values without repetitions (arr)

# Permeability values (mD)
y = [27.10, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 12.65, 12.65, 12.65, 12.65,  14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 27.10, 27.10, 27.10, 11.12, 11.10, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 9.98, 9.98, 11.97, 11.97, 11.97, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.00, 13.27, 13.27, 9.98, 13.27, 9.98, 13.27, 9.98, 13.27]

# Permeability values in descending order (y, mD)
y_sorted = sorted(y, reverse=True)

# Calculate frequency for the permeability values in descending order
y_new_sorted = np.array(y_sorted)
arr,count = np.unique(y_new_sorted,return_counts=True)
arr_sorted = sorted(arr, reverse=True)
print('Frequency= ', count)
print('Permeability values in descending order without repititions= ', arr_sorted)

# Percentage of samples with larger permeability (x, %)
vec_percent = np.vectorize(percent)
np.unique(vec_percent(y_new_sorted))
print('Percentage of samples with larger permeability= ', vec_percent)
     
**OUTPUTS**

Frequency=  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 4 1 5 6 1 1 1 1
 1 1 1 1 1 1 4]

Permeability values in descending order without repititions=  [27.1, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 11.12, 11.1, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.0]

Traceback (most recent call last):
  File line 22, in <module>
    vec_percent = np.vectorize(percent)
NameError: name 'percent' is not defined

Process finished with exit code 1

Solution

  • There are two ways, using traditional list or using efficient numpy:

    Using Lists

    >>> y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
    #declare a lambda function to calculate percentage and frequency
    >>> freq = lambda x: y.count(x)
    >>> percent = lambda z: y.index(z)/len(y)
    #after this using map() and mapping over only unique values rather than all
    >>> print(list(map(freq,set(y))))
    [1, 5, 6, 2, 4, 1]
    >>> print(list(map(percent,set(y))))
    [0.9473684210526315, 0.0, 0.42105263157894735, 0.2631578947368421, 0.7368421052631579, 0.3684210526315789]
    >>> set(y)
    {290, 390, 330, 370, 310, 350}
    #frequency and percent corresponds here to respective values
    

    Using Numpy

    I would recommend using this cause its fast and efficient but you will see better results only if you have a relatively larger dataset to work on.

    >>> import numpy as np
    >>> y_new = np.array(y)
    >>> arr,count = np.unique(y_new,return_counts=True) #very simple approach to get output
    >>> count
    array([1, 4, 6, 1, 2, 5])
    >>> arr
    array([290, 310, 330, 350, 370, 390])
    #defining vectorized percentage function refering to what defined previously
    >>> vec_percent = np.vectorize(percent)
    >>> np.unique(vec_percent(y_new))
    array([0.        , 0.26315789, 0.36842105, 0.42105263, 0.73684211,
           0.94736842])
    #you get your percentages
    

    Now its upon you what to use.