Well, I will try to explain myself correctly:
I am working with python3 with a list of lists in the structure:
[[position, color, counts],...]
The results are ordered firstly by the color and after by position.
I need to combine the counts and the mean of positions if they have the same color and the position between them being at most +-2.
A brief test example of input would be:
[ [1, "red", 3], [2, "red", 2], [3, "red", 3], [5, "red", 1], [3, "green", 9], [10, "green", 4] ]
And the ouput expected:
[ [2.75, "red", 9], [3, "green", 9], [10, "green", 4]
I specially have problems with cases like the 5 "red" 1, due to if a I work performing the mean the distance could be increased falling out the iteration but I want to have it considered as it is at 2 positions of the previous one...
Any idea to solve it?
Thanks in advance!
I think I got your problem right. This snippet should work, but it might be optimizable:
colors = [ [1, "red", 3], [2, "red", 2], [3, "red", 3], [5, "red", 1], [3, "green", 9], [10, "green", 4] ]
def avg(list):
return sum(list) / float(len(list))
def process(colors, threshold=2):
colors_combined = {}
colors_processed = []
# sort colors by their name
for color in colors:
position, color_name, count = color
if color_name not in colors_combined.keys():
colors_combined[color_name] = []
colors_combined[color_name].append([position, count])
# print colors_combined
# process the data
for color in colors_combined.keys():
data = colors_combined[color]
if len(data) == 1: # there can't be a case, where len(data) = 0
colors_processed.append([data[0], color, data[1]])
else: # more than 1 positions to check
last_position = data[0][0]
positions = [last_position]
count_combined = data[0][1]
for element in data[1:]:
if abs(last_position - element[0]) <= threshold: # element is inside of the distance
positions.append(element[0])
count_combined += element[1]
else:
colors_processed.append([avg(positions), color, count_combined])
positions = [element[0]]
count_combined = element[1]
last_position = element[0]
if len(positions) > 0: # the last processed elements where inside the distance, but not added
colors_processed.append([avg(positions), color, count_combined])
return colors_processed
print process(colors)
The output looks like this:
[[3.0, 'green', 9], [10.0, 'green', 4], [2.75, 'red', 9]]
If you need sorted results, you can add a color ordering instead of colors_combined.keys()
.