Search code examples
pythonpandasunique-values

Unique Values and the corresponding % in the number they appear


I want to achieve something here, the unique values with the corresponding percentage appearance of the values in the column. e.g. I have dataframe df and column prices ,to get the list it will be sth like.

print(df.prices.unique().tolist())

it could result to

1030,1075, 2010,3000, 3050, 4050, 4550

However I want a % mapped to all te answers above. I think value counts would work but I dont know how..


Solution

  • depending of your expectation, you can do as mentionned in comments: To have the % of occurences of each prices:

    data={'prices':[1030,1075, 2010,3000, 3050, 4050, 4550,1030,1030,1030,1030, 3050, 3050,3050,3050]}
    df=pd.DataFrame(data)
    
    
    #to have %of occurence
    print(df['prices'].value_counts(normalize=True))
    

    Result:

    1030    0.333333
    3050    0.333333
    1075    0.066667
    2010    0.066667
    3000    0.066667
    4050    0.066667
    4550    0.066667
    Name: prices, dtype: float64
    

    Or if you want to have the sum of all items of this prices / total sum of prices:

    data={'prices':[1030,1075, 2010,3000, 3050, 4050, 4550,1030,1030,1030,1030, 3050, 3050,3050,3050]}
    df=pd.DataFrame(data)
    
    
    #to have %of occurence
    print(df['prices'].value_counts(normalize=True))
    
    
    #to have %of Sum of prices
    df['forCumSum']=df['prices']
    dfCumSum=df.groupby('prices')['forCumSum'].sum().reset_index()
    dfCumSum["%of totalPrices"]=dfCumSum['forCumSum']/dfCumSum['forCumSum'].sum()
    print(dfCumSum.sort_values("%of totalPrices",ascending=False))
    

    result:

       prices  forCumSum  %of totalPrices
    4    3050      15250         0.434659
    0    1030       5150         0.146786
    6    4550       4550         0.129685
    5    4050       4050         0.115434
    3    3000       3000         0.085507
    2    2010       2010         0.057289
    1    1075       1075         0.030640