Search code examples
pythonalgorithmmappingroundingmathematical-optimization

Match list of floats to nearest integers without repeating


I have an algorithm I'm trying to implement and I'm struggling to find a good way of doing it. The goal is to take a list of floats, and make a one-to-one mapping to a list of integers such that no two floats gets mapped to the same integer, and the mapping has the smallest possible error (either in terms of total error or mean squared error).

So, for instance, lets say I have the numbers [2.1, 2.3, 2.4, 7, 7.5, 8.9, 9.3]. I'd like it to return something like:

{
    2.1: 1,
    2.3: 2, 
    2.4: 3,
    7: 7,
    7.5: 8,
    8.9: 9,
    9.3: 10
}

Notice that the numbers clustered around 2 have to get spread out to 1, 2, and 3.

It might help to mention that my motivation here is a practical musical one: to map a list of microtonal pitches (notes "in between the cracks") to the keys of the piano. So a "good enough" solution to the problem is really all I need, though a truly optimal solution would be more exciting!

Also, I'm working in Python, but of course the real question here is not language-specific.


Solution

  • Maybe not the most elegant and the most efficient code, but:

    • it's of polynomial complexity
    • it provides a global optimum

    Basic idea:

    • calculate some worst-case range of candidates (we must not forget any potentially improving candidate)
      • i did not put much though into it -> a "hack"
    • calculate the distance-matrix
    • solve the rectangular linear-assignment problem

    Code:

    import math
    import numpy as np
    from scipy.optimize import linear_sum_assignment
    from scipy.spatial.distance import cdist
    
    SQUARED_PENALTY = True
    
    data = np.array([2.1, 2.3, 2.4, 7, 7.5, 8.9, 9.3])
    
    # hacky safety-net -> which candidates to look at
    min_ = math.floor(data.min())
    max_ = math.ceil(data.max())
    gap = max_ - min_
    
    cands = np.arange(min_ - gap, max_ + gap)
    
    cost_matrix = cdist(data[:, np.newaxis], cands[:, np.newaxis])
    
    if SQUARED_PENALTY:
      cost_matrix = np.square(cost_matrix)
    
    row_ind, col_ind = linear_sum_assignment(cost_matrix)
    
    solution = cands[col_ind]
    
    print(solution)
    print('cost: ', np.round(cost_matrix[row_ind, col_ind].sum(), 3))
    

    Output: l1-costs

    [ 2  1  3  7  8  9 10]
    cost:  3.3
    

    Output: squared-costs

    [ 1  2  3  7  8  9 10]
    cost:  2.41