Search code examples
pythonstatistics

Compute statistical values out of a precounted list in Python


I have a dataframe of precounted data (shown below). Let's assume it's a "Do you like?" scale, where 4 people answered 1-Don't like at all, 10 people answer 2-Don't like and so on.

How can I compute the different statistical values? I want to compute the mean (in this case, it can be done by hand (4*1+10*2+125*3+85*4+25*5)/(4+10+125+85+25)=3.47) and the standard deviation

df=pd.DataFrame({1:4,2:10,3:125,4:85,5:25})

Solution

  • You can create dataframe as much as counted person who rated data for every rated score. Then you can use pandas.DataFrame.describe(). This function give many statistics information.

    import pandas as pd
    
    # Given dictionary
    data = {1: 4, 2: 10, 3: 125, 4: 85, 5: 25}
    
    # Create a list of ratings based on the counts
    ratings = []
    for rating, count in data.items():
        ratings.extend([rating] * count)
    
    # Create a DataFrame
    df = pd.DataFrame(ratings, columns=['Rating'])
    df.describe()