Search code examples
pythonnumpyquartile

quartile python collections Counter


I have the following list that records the count frequency of random objects:

counter_obj= [('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) ... ('blueberry',1),('pineapple',1)]

I'm trying to select eight elements by randomly choosing two objects from each rank quartile.

I tried the following for the first (25%) quartile :

from collections import Counter
dct = {('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) ... ('blueberry',1),('pineapple',1)}
[tup for tup in Counter(dct).most_common(len(dct)//4)] # 25th percentile by frequency count

How can I do for the rest 2 quartiles 50% and 75% knowing that I have many values at 1 ( they appear only once )
My original data bar plot chart : Bar plot from my original data


Solution

  • I would use pandas for this problem:

    import pandas as pd
    
    dct = {('oranges', 66), ('apple', 13), ('banana', 13), ('pear', 12), ('strawberry', 10), ('watermelon', 10), ('avocado', 8) , ('blueberry',1),('pineapple',1)}
    
    df = pd.DataFrame(dct, columns = ['Fruit','Count'])  # convert to DataFrame
    
    
    select = []
    
    for quant in [.25,.5,.75,1]:
      curr_q = df['Count'].quantile(quant)  # this calculates the quantile value
      curr_choice = df[df['Count']<=curr_q].sample(2)  # this selects all rows of your dataframe within current quantile, then samples two of these rows
      select.append(curr_choice)
    
    
    select = pd.concat(select).reset_index(drop=True)  # concatenates the selected rows to get a nice dataframe, resets the indices.