Search code examples
pythonarraysnumpyheap

Adding numpy array to a heap queue


Can someone please explain why the following code results in a ValueError?

import heapq
import numpy as np

a = np.ones((2, 2), dtype=int)

states = []
heapq.heappush(states, (0, a))
heapq.heappush(states, (0, a.copy()))

The error message is:

Traceback (most recent call last):
  File "x.py", line 8, in <module>
    heapq.heappush(states, (0, a.copy()))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Running it without adding the a.copy() to the heap works fine, the second/subsequent one is for some reason a problem. I do understand that there is a unknown truth value aspect with an array of [True, False, True] and that it's not possible to determine a single True or False from that, but why does heapq need to do that? Especially only in the second case?


Solution

  • TL;DR: Because numpy arrays can't be cast to a boolean if they contain more than one element.


    Some informations about heaps:

    Heaps "order" their contents (so the items must implement < but that's an implementation detail).

    You however insert items into the heap by creating tuples for the items, where the first elements is some value and the second one is the array.

    Comparing tuples first checks if the first items are equal and if they are it checks if the second items are equal and so on until they are not equal, then it will check if it's smaller (when the operation was <) or greater (for >). However tuples are implemented in C and the == check there is a bit different than the one in Python. It uses PyObject_RichCompareBool. Especially the "note" is important here

    If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQand 0 for Py_NE.

    Now let's go to numpy arrays:

    You can't convert a numpy.array to a bool if it contains more than one item:

    >>> arr = np.array([1,2,3])
    >>> bool(arr)
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    

    An if check implicitly converts the condition to a boolean:

    >>> if arr: pass
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    

    Even after comparing numpy-arrays they are still numpy arrays:

    >>> arr > arr
    array([False, False], dtype=bool)
    >>> arr == arr
    array([ True,  True], dtype=bool)
    

    So these can't be evaluated with ==:

    >>> if arr == arr: pass
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    

    So you can't convert numpy-arrays with more than one element to a boolean! However now comes the interesting part: The heapq-module uses PyObject_RichCompareBool() so it can check if two arrays are equal but if and only if these are identical!

    That's why it works with the same array passed in several times but it fails when you copy it:

    >>> arr is arr
    True
    >>> arr is arr.copy()
    False