Search code examples
pythonlistmaxmin

How to get statistics from an array Collection ? (min max, and average)


I have a text file that has a long 2D array as follows:

[[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585] ..... [6, 47]]

The first element of each has numbers between 1 to 7. I want to read information of second element for all and find the maximum and minimum amount for each group between 1 to 7 separately. For example output like this:

Mix for first element with 1: 500  
Max for first element with 1: 2
average: 251

Min for row with 2:  0
Max for row with 2:  5000
average: 2500

and so on 

What is the most efficient way of getting min, max, and average values by grouping based on the first element of the array?

file = open("myfile.txt", "r")
list_of_lists = file.read()

unique_values = set([list[1] for list in list_of_lists])
group_list = [[list[0] for list in list_of_lists if list[1] == value] for value in unique_values]

print(group_list) 

Solution

  • We can use pandas for this:

    import numpy as np
    import pandas as pd
    
    file_data = [[1, 2], [5,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [5,906], [2,5000], [6,358], [4,69], [3,89], [7, 258],[5, 632], [7, 585], [6, 47]]
    
    file_data = np.array(file_data)
    
    df = pd.DataFrame(data = {'num': file_data[:, 0], 'data': file_data[:, 1]})
    
    for i in np.sort(df['num'].unique()):
        print('Min for', i, ':', df.loc[df['num'] == i, 'data'].min())
        print('Max for', i, ':', df.loc[df['num'] == i, 'data'].max())
        temp_df = df.loc[df['num'] == i, 'data']
        print("Average for", i, ":", temp_df.sum()/len(temp_df.index))
    

    This gives us:

    Min for 1 : 2
    Max for 1 : 500
    Average for 1 : 251.0
    Min for 2 : 0
    Max for 2 : 5000
    Average for 2 : 1889.3333333333333
    Min for 3 : 28
    Max for 3 : 89
    Average for 3 : 64.0
    Min for 4 : 28
    Max for 4 : 163
    Average for 4 : 86.66666666666667
    Min for 5 : 585
    Max for 5 : 906
    Average for 5 : 707.6666666666666
    Min for 6 : 47
    Max for 6 : 358
    Average for 6 : 202.5
    Min for 7 : 258
    Max for 7 : 585
    Average for 7 : 421.5