Search code examples
pythonlistlogic

Sort lists keeping maximum distance between repeating elements


Given a list, ie: [1,1,2,2,3,5]

I'd like to reorder it: [1,2,3,5,1,2]


EDIT after the comment: Final goal is to maximizes the smallest distance between any two occurrences of the same number, or minimize the closeness of repeating elements and spread them out as much as possible in the list

Duplicate questions: Algorithm to separate items of the same type

Maximize the minimum distance between the same elements

I came up with this solution, that's the onw I'll use, if nothing better will be proposed: https://stackoverflow.com/a/76792326/5053475


What alghoritm can I use?

Proceeding by logic: I was thinking on calculate the occurencies, let's say, ie: [1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 44, 4, 5, 6, 1, 4, 5, 3, 6, 7, 8, 9, 11, 123, 22, 44, 1, 5, 5]

would give this "occurency count" matrix 2x30: [(3, 10), (1, 4), (5, 4), (44, 2), (4, 2), (6, 2), (7, 1), (8, 1), (9, 1), (11, 1), (123, 1), (22, 1)]

so I would:

  • calculate all the numbers that have more than 1 repetition, in this case, 6 numbers
  • There are 3 numbers with 2 occurrences:
    • The 6, it will go in poisition 3 (0+5-2) and 24 (29-5)
    • The 4, will go in positions 4 (0+5-1) and 25 (29-5+1)
    • The 44, will go in positions 5 (0+5) and 26 (29-5+2)
  • There are 2 numbers with 4 occurencies
    • The 5 will go between positions 1 (0+6-3-1) and 27 (29-6+3), every aprox 6 numbers
    • The 1 will go between positions 2 (0+6-3) and 28 (29-6+3+1), every aprox 6 numbers
  • There is 1 number with 10 occurencies
    • The 3 will go between positions 0 and 29, every aprox 3 numbers.

And so on. Then fullfill the empty position with the non repeating numbers. When some collision (position already occupied) alternate +1, -1, +2, -2 and so on till find a free position I came up with this code, that boldly follow my algorithm,

from collections import defaultdict

def find_nearest_none(arr, position):
    if arr[position] is None:
        return position

    step = 1
    while True:
        left_pos = position - step
        right_pos = position + step

        if left_pos >= 0 and arr[left_pos] is None:
            return left_pos
        elif right_pos < len(arr) and arr[right_pos] is None:
            return right_pos
        elif left_pos < 0 and right_pos >= len(arr):
            # Both left and right positions are out of bounds
            return False

        step += 1

def max_distance_list(nums):
    num_occurrences = {}
    t = len(nums)
    out = [None] * t
    
    for num in nums:
        num_occurrences[num] = num_occurrences.get(num, 0) + 1
    num_occurrences = sorted(num_occurrences.items(), key=lambda item: item[1], reverse=True)

    grouped_data = defaultdict(list)
    for key, value in num_occurrences:
        grouped_data[value].append(key)

    print(grouped_data)

    start_pos = 0 
    for x, y in dict(grouped_data).items():
        print("Start pos:", start_pos)
        for z in y:
            sep = t // x
            pos = start_pos;
            for i in range(x):
                free_pos = find_nearest_none(out, pos)
                out[free_pos] = z 
                pos+=sep;
            start_pos+=1;
    return out

Output: [3, 1, 5, 3, 44, 4, 3, 6, 1, 3, 5, 7, 3, 8, 1, 3, 5, 44, 3, 4, 6, 3, 1, 5, 3, 9, 11, 3, 123, 22]

but that's not an optimal result, it's just acceptable, and far away to be fast. Is there some "already done" function to do this, better and faster?

If not, I'll proceed in chunks and will use this 'acceptable' result since it's for creating a queue a crawler, to distantiate calls to the same IP and/or domains

Appreciate any help


Solution

  • You can use the Counter class to determine how many of each number you have. then progressively build the list in ascending order of counts. The unique numbers will go first., Then for numbers that are there twice, the farthest spread would be one at the beginning and one at the end. Beyond two repetitions, you still want one at the beginning and one at the end but the remaining ones should spread evenly by splitting the list in count-1 chunks.

    For example, Y repeated 4 times should place one at the start, one at the end and the other two splitting the items in 3 chunks.

    ...... --> Y..Y..Y..Y
    

    Here's what the function could look like:

    from collections import Counter
    
    def spread(A):
        result   = []
        for value,count in reversed(Counter(A).most_common()):
            result.append(value)
            if count == 1: continue
            result.insert(0,value)
            if count == 2: continue
            chunk,extra = divmod(len(result)-2,count-1)
            i = 0
            for _ in range(count-2):
                i += chunk + 1 + (extra>0)
                extra -= 1
                result.insert(i,value)           
        return result
    

    output:

    A = [1,1,3,3,3,3,3,3,3,3,3,44,4,5,6,1,4,5,3,6,7,8,9,11,123,22,44,1,5,5]
    spread(A)
    [3,1,5,44,3,4,6,22,3,5,1,3,123,11,3,9,8,3,5,1,3,7,6,3,4,44,3,5,1,3]
    

    For comparison purposes, here is a function that computes the distances between repeated items returning how many instances are at each distance.

    def repDist(A):
        c = Counter(A[i:].index(n)+1 for i,n in enumerate(A,1) if n in A[i:])
        return dict(sorted(c.items()))
    
    OP = [3,1,5,3,44,4,3,6,1,3,5,7,3,8,1,3,5,44,3,4,6,3,1,5,3,9,11,3,123,22]
    
                       # {distance:count}
    repDist(OP)        # {3: 9, 6: 2, 7: 2, 8: 2, 13: 2, 14: 1}
    repDist(spread(A)) # {3: 7, 4: 2, 7: 1, 9: 5, 16: 1, 19: 1, 22: 1}
    

    Comparison matching instance counts:

            |----9---| |--------6--------| |------------3--------------|
    OP:     {3:9,      6:2, 7:2, 8:2,      13:2, 14:1}
    spread: {3:7, 4:2,      7:1,      9:5,             16:1, 19:1, 22:1}
    

    [EDIT] Improved version:

    Inspired by Swifty's comments, I made an improved version that processes values with the same number of repetitions as a group. This covers more edge cases and gives a more regular distribution of these same-repetition groups. It gives them equal spread distances instead of favouring the later values over earlier values with the same repetition count:

    def spread2(A):
        countGroups = dict()
        for value,count in Counter(A).items():
            countGroups.setdefault(count,[]).append(value)
        result   = []
        for count,values in sorted(countGroups.items()):
            result.extend(values)
            if count == 1: continue
            result[:0] = values
            if count == 2: continue
            chunk,extra = divmod(len(result)-2*len(values),count-1)
            i = 0
            for _ in range(count-2):
                i += chunk + len(values) + (extra>0)
                extra -= 1
                result[i:i] = values
        return result
    

    output:

    spread2([1,1,2,2,3,5])  # [1, 2, 3, 5, 1, 2]
    spread2([1,1,2,2,3,3])  # [1, 2, 3, 1, 2, 3]
    
    spread2(A)
    [3,1,5,44,3,4,6,7,3,1,5,3,8,9,3,11,123,3,1,5,3,22,44,3,4,6,3,1,5,3]
    
    repDist(spread2(A))     #  {3: 7, 4: 2, 8: 2, 9: 4, 19: 3}
    

    Comparison:

             |----9---| |--------6--------| |------------3--------------|
    OP:      {3:9,      6:2, 7:2, 8:2,      13:2, 14:1}
    spread:  {3:7, 4:2,      7:1,      9:5,             16:1, 19:1, 22:1}
    spread2: {3:7, 4:2,           8:2, 9:4,                   19:3}
    

    BTW if speed is a concern, spread2 processes a 2000 item list in 0.00025 to 0.00050 sec. depending on repetition patterns