Given a list, ie: [1,1,2,2,3,5]
I'd like to reorder it: [1,2,3,5,1,2]
EDIT after the comment: Final goal is to maximizes the smallest distance between any two occurrences of the same number, or minimize the closeness of repeating elements and spread them out as much as possible in the list
Duplicate questions: Algorithm to separate items of the same type
Maximize the minimum distance between the same elements
I came up with this solution, that's the onw I'll use, if nothing better will be proposed: https://stackoverflow.com/a/76792326/5053475
What alghoritm can I use?
Proceeding by logic: I was thinking on calculate the occurencies, let's say, ie: [1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 44, 4, 5, 6, 1, 4, 5, 3, 6, 7, 8, 9, 11, 123, 22, 44, 1, 5, 5]
would give this "occurency count" matrix 2x30: [(3, 10), (1, 4), (5, 4), (44, 2), (4, 2), (6, 2), (7, 1), (8, 1), (9, 1), (11, 1), (123, 1), (22, 1)]
so I would:
And so on. Then fullfill the empty position with the non repeating numbers. When some collision (position already occupied) alternate +1, -1, +2, -2 and so on till find a free position I came up with this code, that boldly follow my algorithm,
from collections import defaultdict
def find_nearest_none(arr, position):
if arr[position] is None:
return position
step = 1
while True:
left_pos = position - step
right_pos = position + step
if left_pos >= 0 and arr[left_pos] is None:
return left_pos
elif right_pos < len(arr) and arr[right_pos] is None:
return right_pos
elif left_pos < 0 and right_pos >= len(arr):
# Both left and right positions are out of bounds
return False
step += 1
def max_distance_list(nums):
num_occurrences = {}
t = len(nums)
out = [None] * t
for num in nums:
num_occurrences[num] = num_occurrences.get(num, 0) + 1
num_occurrences = sorted(num_occurrences.items(), key=lambda item: item[1], reverse=True)
grouped_data = defaultdict(list)
for key, value in num_occurrences:
grouped_data[value].append(key)
print(grouped_data)
start_pos = 0
for x, y in dict(grouped_data).items():
print("Start pos:", start_pos)
for z in y:
sep = t // x
pos = start_pos;
for i in range(x):
free_pos = find_nearest_none(out, pos)
out[free_pos] = z
pos+=sep;
start_pos+=1;
return out
Output: [3, 1, 5, 3, 44, 4, 3, 6, 1, 3, 5, 7, 3, 8, 1, 3, 5, 44, 3, 4, 6, 3, 1, 5, 3, 9, 11, 3, 123, 22]
but that's not an optimal result, it's just acceptable, and far away to be fast. Is there some "already done" function to do this, better and faster?
If not, I'll proceed in chunks and will use this 'acceptable' result since it's for creating a queue a crawler, to distantiate calls to the same IP and/or domains
Appreciate any help
You can use the Counter class to determine how many of each number you have. then progressively build the list in ascending order of counts. The unique numbers will go first., Then for numbers that are there twice, the farthest spread would be one at the beginning and one at the end. Beyond two repetitions, you still want one at the beginning and one at the end but the remaining ones should spread evenly by splitting the list in count-1 chunks.
For example, Y repeated 4 times should place one at the start, one at the end and the other two splitting the items in 3 chunks.
...... --> Y..Y..Y..Y
Here's what the function could look like:
from collections import Counter
def spread(A):
result = []
for value,count in reversed(Counter(A).most_common()):
result.append(value)
if count == 1: continue
result.insert(0,value)
if count == 2: continue
chunk,extra = divmod(len(result)-2,count-1)
i = 0
for _ in range(count-2):
i += chunk + 1 + (extra>0)
extra -= 1
result.insert(i,value)
return result
output:
A = [1,1,3,3,3,3,3,3,3,3,3,44,4,5,6,1,4,5,3,6,7,8,9,11,123,22,44,1,5,5]
spread(A)
[3,1,5,44,3,4,6,22,3,5,1,3,123,11,3,9,8,3,5,1,3,7,6,3,4,44,3,5,1,3]
For comparison purposes, here is a function that computes the distances between repeated items returning how many instances are at each distance.
def repDist(A):
c = Counter(A[i:].index(n)+1 for i,n in enumerate(A,1) if n in A[i:])
return dict(sorted(c.items()))
OP = [3,1,5,3,44,4,3,6,1,3,5,7,3,8,1,3,5,44,3,4,6,3,1,5,3,9,11,3,123,22]
# {distance:count}
repDist(OP) # {3: 9, 6: 2, 7: 2, 8: 2, 13: 2, 14: 1}
repDist(spread(A)) # {3: 7, 4: 2, 7: 1, 9: 5, 16: 1, 19: 1, 22: 1}
Comparison matching instance counts:
|----9---| |--------6--------| |------------3--------------|
OP: {3:9, 6:2, 7:2, 8:2, 13:2, 14:1}
spread: {3:7, 4:2, 7:1, 9:5, 16:1, 19:1, 22:1}
[EDIT] Improved version:
Inspired by Swifty's comments, I made an improved version that processes values with the same number of repetitions as a group. This covers more edge cases and gives a more regular distribution of these same-repetition groups. It gives them equal spread distances instead of favouring the later values over earlier values with the same repetition count:
def spread2(A):
countGroups = dict()
for value,count in Counter(A).items():
countGroups.setdefault(count,[]).append(value)
result = []
for count,values in sorted(countGroups.items()):
result.extend(values)
if count == 1: continue
result[:0] = values
if count == 2: continue
chunk,extra = divmod(len(result)-2*len(values),count-1)
i = 0
for _ in range(count-2):
i += chunk + len(values) + (extra>0)
extra -= 1
result[i:i] = values
return result
output:
spread2([1,1,2,2,3,5]) # [1, 2, 3, 5, 1, 2]
spread2([1,1,2,2,3,3]) # [1, 2, 3, 1, 2, 3]
spread2(A)
[3,1,5,44,3,4,6,7,3,1,5,3,8,9,3,11,123,3,1,5,3,22,44,3,4,6,3,1,5,3]
repDist(spread2(A)) # {3: 7, 4: 2, 8: 2, 9: 4, 19: 3}
Comparison:
|----9---| |--------6--------| |------------3--------------|
OP: {3:9, 6:2, 7:2, 8:2, 13:2, 14:1}
spread: {3:7, 4:2, 7:1, 9:5, 16:1, 19:1, 22:1}
spread2: {3:7, 4:2, 8:2, 9:4, 19:3}
BTW if speed is a concern, spread2
processes a 2000 item list in 0.00025 to 0.00050 sec. depending on repetition patterns