Search code examples
pythonnumpysortedcontainers

Populating a SortedLIst from an n×2 numpy array


I have a numpy array with the shape n×2, a bunch of tuples of length 2, that I would like to transfer to SortedList. So the goal is to create a SortedList with integer tuples of length 2.

The problem is that the constructor of SortedList checks the truth value of each entry. This works fine for one-dimensional arrays:

In [1]: import numpy as np
In [2]: from sortedcontainers import SortedList
In [3]: a = np.array([1,2,3,4])
In [4]: SortedList(a)
Out[4]: SortedList([1, 2, 3, 4], load=1000)

But for two dimensions, when each entry is an array, there is no clear truth value and SortedList is uncooperative:

In [5]: a.resize(2,2)
In [6]: a
Out[6]: 
array([[1, 2],
       [3, 4]])

In [7]: SortedList(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-7a4b2693bb52> in <module>()
----> 1 SortedList(a)

/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in __init__(self, iterable, load)
     81 
     82         if iterable is not None:
---> 83             self._update(iterable)
     84 
     85     def __new__(cls, iterable=None, key=None, load=1000):

/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in update(self, iterable)
    176         _lists = self._lists
    177         _maxes = self._maxes
--> 178         values = sorted(iterable)
    179 
    180         if _maxes:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My current workaround is to convert each line to a tuple manually:

sl = SortedList()
for t in np_array:
    x, y = t
    sl.add((x,y))

However, this solution offers some room for improvement. Does anyone have an idea how to solve this problem without explicitly unpacking all the arrays into tuples?


Solution

  • The problem isn't that the truth value of the arrays is being checked, it's that they're being compared so that they can be sorted. If you use comparison operators on arrays, you get arrays back:

    >>> import numpy as np
    >>> np.array([1, 4]) < np.array([2, 3])
    array([ True, False], dtype=bool)
    

    This resulting boolean array is actually the array whose truth value is being checked by sorted.

    On the other hand, the same operation with tuples (or lists) will do an element by element comparison and return a single boolean value:

    >>> (1, 4) < (2, 3)
    True
    >>> (1, 4) < (1, 3)
    False
    

    So when SortedList tries to use sorted on a sequence of numpy arrays, it can't do a comparison, because it needs single boolean values to be returned by comparison operators.

    One way to abstract this would be to create a new array class that implements comparison operators like __eq__, __lt__, __gt__, etc. to reproduce the sorting behavior of tuples. Ironically, the easiest way to do this would be to cast the underlying arrays to tuples, like:

    class SortableArray(object):
    
        def __init__(self, seq):
            self._values = np.array(seq)
    
        def __eq__(self, other):
            return tuple(self._values) == tuple(other._values)
            # or:
            # return np.all(self._values == other._values)
    
        def __lt__(self, other):
            return tuple(self._values) < tuple(other._values)
    
        def __gt__(self, other):
            return tuple(self._values) > tuple(other._values)
    
        def __le__(self, other):
            return tuple(self._values) <= tuple(other._values)
    
        def __ge__(self, other):
            return tuple(self._values) >= tuple(other._values)
    
        def __str__(self):
            return str(self._values)
    
        def __repr__(self):
            return repr(self._values)
    

    With this implementation, you can now sort a list of SortableArray objects:

    In [4]: ar1 = SortableArray([1, 3])
    
    In [5]: ar2 = SortableArray([1, 4])
    
    In [6]: ar3 = SortableArray([1, 3])
    
    In [7]: ar4 = SortableArray([4, 5])
    
    In [8]: ar5 = SortableArray([0, 3])
    
    In [9]: lst1 = [ar1, ar2, ar3, ar4, ar5]
    
    In [10]: lst1
    Out[10]: [array([1, 3]), array([1, 4]), array([1, 3]), array([4, 5]), array([0, 3])]
    
    In [11]: sorted(lst1)
    Out[11]: [array([0, 3]), array([1, 3]), array([1, 3]), array([1, 4]), array([4, 5])]
    

    This might be overkill for what you need, but it's one way to do it. In any case, you won't get away with using sorted on a sequence of objects that don't return a single boolean value on comparison.

    If all you're after is avoiding the for loop, you could just replace it with a list comprehension (i.e. SortedList([tuple(row) for row in np_array])).