I have a numpy array with the shape n×2, a bunch of tuples of length 2, that I would like to transfer to SortedList. So the goal is to create a SortedList with integer tuples of length 2.
The problem is that the constructor of SortedList checks the truth value of each entry. This works fine for one-dimensional arrays:
In [1]: import numpy as np
In [2]: from sortedcontainers import SortedList
In [3]: a = np.array([1,2,3,4])
In [4]: SortedList(a)
Out[4]: SortedList([1, 2, 3, 4], load=1000)
But for two dimensions, when each entry is an array, there is no clear truth value and SortedList is uncooperative:
In [5]: a.resize(2,2)
In [6]: a
Out[6]:
array([[1, 2],
[3, 4]])
In [7]: SortedList(a)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-7a4b2693bb52> in <module>()
----> 1 SortedList(a)
/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in __init__(self, iterable, load)
81
82 if iterable is not None:
---> 83 self._update(iterable)
84
85 def __new__(cls, iterable=None, key=None, load=1000):
/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in update(self, iterable)
176 _lists = self._lists
177 _maxes = self._maxes
--> 178 values = sorted(iterable)
179
180 if _maxes:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My current workaround is to convert each line to a tuple manually:
sl = SortedList()
for t in np_array:
x, y = t
sl.add((x,y))
However, this solution offers some room for improvement. Does anyone have an idea how to solve this problem without explicitly unpacking all the arrays into tuples?
The problem isn't that the truth value of the arrays is being checked, it's that they're being compared so that they can be sorted. If you use comparison operators on arrays, you get arrays back:
>>> import numpy as np
>>> np.array([1, 4]) < np.array([2, 3])
array([ True, False], dtype=bool)
This resulting boolean array is actually the array whose truth value is being checked by sorted
.
On the other hand, the same operation with tuples (or lists) will do an element by element comparison and return a single boolean value:
>>> (1, 4) < (2, 3)
True
>>> (1, 4) < (1, 3)
False
So when SortedList
tries to use sorted
on a sequence of numpy
arrays, it can't do a comparison, because it needs single boolean values to be returned by comparison operators.
One way to abstract this would be to create a new array class that implements comparison operators like __eq__
, __lt__
, __gt__
, etc. to reproduce the sorting behavior of tuples. Ironically, the easiest way to do this would be to cast the underlying arrays to tuples, like:
class SortableArray(object):
def __init__(self, seq):
self._values = np.array(seq)
def __eq__(self, other):
return tuple(self._values) == tuple(other._values)
# or:
# return np.all(self._values == other._values)
def __lt__(self, other):
return tuple(self._values) < tuple(other._values)
def __gt__(self, other):
return tuple(self._values) > tuple(other._values)
def __le__(self, other):
return tuple(self._values) <= tuple(other._values)
def __ge__(self, other):
return tuple(self._values) >= tuple(other._values)
def __str__(self):
return str(self._values)
def __repr__(self):
return repr(self._values)
With this implementation, you can now sort a list of SortableArray
objects:
In [4]: ar1 = SortableArray([1, 3])
In [5]: ar2 = SortableArray([1, 4])
In [6]: ar3 = SortableArray([1, 3])
In [7]: ar4 = SortableArray([4, 5])
In [8]: ar5 = SortableArray([0, 3])
In [9]: lst1 = [ar1, ar2, ar3, ar4, ar5]
In [10]: lst1
Out[10]: [array([1, 3]), array([1, 4]), array([1, 3]), array([4, 5]), array([0, 3])]
In [11]: sorted(lst1)
Out[11]: [array([0, 3]), array([1, 3]), array([1, 3]), array([1, 4]), array([4, 5])]
This might be overkill for what you need, but it's one way to do it. In any case, you won't get away with using sorted
on a sequence of objects that don't return a single boolean value on comparison.
If all you're after is avoiding the for loop, you could just replace it with a list comprehension (i.e. SortedList([tuple(row) for row in np_array])
).