Search code examples
pythonlistupdating

Analysing and calculating new values in list of lists in Python3


Well, I will try to explain myself correctly:

I am working with python3 with a list of lists in the structure:

[[position, color, counts],...]

The results are ordered firstly by the color and after by position.

I need to combine the counts and the mean of positions if they have the same color and the position between them being at most +-2.

A brief test example of input would be:

[ [1, "red", 3],  [2, "red", 2],  [3, "red", 3], [5, "red", 1], [3, "green", 9],  [10, "green", 4] ]

And the ouput expected:

[ [2.75, "red", 9], [3, "green", 9], [10, "green", 4]

I specially have problems with cases like the 5 "red" 1, due to if a I work performing the mean the distance could be increased falling out the iteration but I want to have it considered as it is at 2 positions of the previous one...

Any idea to solve it?

Thanks in advance!


Solution

  • I think I got your problem right. This snippet should work, but it might be optimizable:

    colors = [ [1, "red", 3], [2, "red", 2], [3, "red", 3], [5, "red", 1], [3, "green", 9], [10, "green", 4] ]
    
    def avg(list): 
        return sum(list) / float(len(list))
    
    def process(colors, threshold=2): 
        colors_combined = {}
        colors_processed = []
    
        # sort colors by their name
        for color in colors: 
            position, color_name, count = color
    
            if color_name not in colors_combined.keys(): 
                colors_combined[color_name] = []
    
            colors_combined[color_name].append([position, count])
    
        # print colors_combined
    
        # process the data
        for color in colors_combined.keys(): 
            data = colors_combined[color]
    
            if len(data) == 1: # there can't be a case, where len(data) = 0
                colors_processed.append([data[0], color, data[1]])
            else: # more than 1 positions to check
                last_position = data[0][0]
                positions = [last_position]
                count_combined = data[0][1]
    
                for element in data[1:]: 
                    if abs(last_position - element[0]) <= threshold: # element is inside of the distance
                        positions.append(element[0])
                        count_combined += element[1]
                    else: 
                        colors_processed.append([avg(positions), color, count_combined])
                        positions = [element[0]]
                        count_combined = element[1]
    
                    last_position = element[0]
    
                if len(positions) > 0: # the last processed elements where inside the distance, but not added
                    colors_processed.append([avg(positions), color, count_combined])
    
        return colors_processed
    
    print process(colors)
    

    The output looks like this:

    [[3.0, 'green', 9], [10.0, 'green', 4], [2.75, 'red', 9]]
    

    If you need sorted results, you can add a color ordering instead of colors_combined.keys().