I am trying to write a Python code that calculates for a given list of values (y) in descending order the frequency for each y value and the associated percentage of samples(yi) with larger y value taking into account the frequency.
Thanks very much! This is the Python code I'v written using NumPy but I get some errors while calculating the percentages and when calculating the frequency I want it to be in order in consistency with the new array of y values without repetitions (arr)
# Permeability values (mD)
y = [27.10, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 12.65, 12.65, 12.65, 12.65, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 27.10, 27.10, 27.10, 11.12, 11.10, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 9.98, 9.98, 11.97, 11.97, 11.97, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.00, 13.27, 13.27, 9.98, 13.27, 9.98, 13.27, 9.98, 13.27]
# Permeability values in descending order (y, mD)
y_sorted = sorted(y, reverse=True)
# Calculate frequency for the permeability values in descending order
y_new_sorted = np.array(y_sorted)
arr,count = np.unique(y_new_sorted,return_counts=True)
arr_sorted = sorted(arr, reverse=True)
print('Frequency= ', count)
print('Permeability values in descending order without repititions= ', arr_sorted)
# Percentage of samples with larger permeability (x, %)
vec_percent = np.vectorize(percent)
np.unique(vec_percent(y_new_sorted))
print('Percentage of samples with larger permeability= ', vec_percent)
**OUTPUTS**
Frequency= [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 4 1 5 6 1 1 1 1
1 1 1 1 1 1 4]
Permeability values in descending order without repititions= [27.1, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 11.12, 11.1, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.0]
Traceback (most recent call last):
File line 22, in <module>
vec_percent = np.vectorize(percent)
NameError: name 'percent' is not defined
Process finished with exit code 1
There are two ways, using traditional list
or using efficient numpy
:
Using Lists
>>> y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
#declare a lambda function to calculate percentage and frequency
>>> freq = lambda x: y.count(x)
>>> percent = lambda z: y.index(z)/len(y)
#after this using map() and mapping over only unique values rather than all
>>> print(list(map(freq,set(y))))
[1, 5, 6, 2, 4, 1]
>>> print(list(map(percent,set(y))))
[0.9473684210526315, 0.0, 0.42105263157894735, 0.2631578947368421, 0.7368421052631579, 0.3684210526315789]
>>> set(y)
{290, 390, 330, 370, 310, 350}
#frequency and percent corresponds here to respective values
Using Numpy
I would recommend using this cause its fast and efficient but you will see better results only if you have a relatively larger dataset to work on.
>>> import numpy as np
>>> y_new = np.array(y)
>>> arr,count = np.unique(y_new,return_counts=True) #very simple approach to get output
>>> count
array([1, 4, 6, 1, 2, 5])
>>> arr
array([290, 310, 330, 350, 370, 390])
#defining vectorized percentage function refering to what defined previously
>>> vec_percent = np.vectorize(percent)
>>> np.unique(vec_percent(y_new))
array([0. , 0.26315789, 0.36842105, 0.42105263, 0.73684211,
0.94736842])
#you get your percentages
Now its upon you what to use.