Disclaimer: This is probably a difficult question to answer, but I would greatly appreciate any insights, thoughts or suggestions. If this has already been answered elsewhere and I simply haven't managed to find it, please let me know. Also, I'm somewhat new to algorithm engineering in general, specifically to using Python/NumPy for implementing and evaluating non-trivial algorithm prototypes, and picking it all up as I go. I may be missing something fundamental to scientific computing with NumPy.
To moderator: Feel free to move this thread if there is a better forum for it. I'm not sure this strictly qualifies for the Computational Science forum as it potentially boils down to an implementation/coding issue.
Note: If you want to understand the problem and context, read everything.
I'm using NumPy's std() function to calculate standard deviations in the implementation of an algorithm that finds optimal combinations of nodes in a graph, loosely based on minspan trees. A selection function contains the call to std(). This is an early single-threaded prototype of the algorithm; the algorithm was originally designed for multi-threading (which will likely be implemented in C or C++, so not relevant here).
Edge weights are dependent on properties of nodes previously selected and so are calculated when the algorithm examines an available node. The graph may contain several thousand nodes. At this stage searches are exhaustive, potentially requiring hundreds of thousands of calculations.
Additionally, evaluations of the algorithm are automated and may run hundreds of consecutive trials depending on user input. I haven't seen errors (below) crop up during searches of smaller graphs (e.g., 100 nodes), however scaling the size of the graph close to 1000 guarantees that they make an appearance.
The relevant code can be reduced to roughly the following:
# graph.available = [{'id': (int), 'dims': {'dim1': (int), ...}}, ...]
# accumulatedDistr = {'dim1': (int), ...}
# Note: dicts as nodes, etc used here for readability
edges = []
for node in graph.available:
intersection = my.intersect(node['distr'], accumulatedDistr) # Returns list
edgeW = numpy.std(intersection)
edges.append((node['id'], edgeW))
# Perform comparisons, select and combine into accumulatedDistr
Distributions are guaranteed to contain non-negative, non-zero integer values and the lists returned from my.intersect() are likewise guaranteed to never be empty.
When I run a series of trials I'll occasionally see the following messages in the terminal:
C:\Program Files\Python36\lib\site-packages\numpy\core\_methods.py:135: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
C:\Program Files\Python36\lib\site-packages\numpy\core\_methods.py:105: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\Program Files\Python36\lib\site-packages\numpy\core\_methods.py:127: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
They typically only appear a few times during a set of trials, so likely every few million calculations. However, one bad calculation can (at the least!) subtly alter the results of an entire trial, so I'd still like to prevent this if at all possible. I assume there must be some internal state that's accumulating errors, but I've found precious little to suggest how I might address the problem. This concerns me greatly as accumulation of errors casts doubt on all of the following calculations, if that is indeed the case.
I suppose that I may have to look for a proprietary library (e.g., Python wrappers for Intel kernal math libs) to guarantee the kind of extreme-volume (pardon the abuse of terminology) computational stability that I want. But first: Is there a way to prevent them (NumPy)? Also, just in case: Is the problem as serious as I'm afraid it could be?
I've confirmed that this is indeed a bug in my own code, despite never catching it in the tests. Granted, on the evidence I would have had to run a few million or so consecutive randomized-input tests. On reflection, that might not be a bad idea as general practice for critical sections of code, despite the amount of time it takes. Better to catch it early on than after you've built an entire library around the affected code. Lesson learned.
Thanks to BrenBarn for putting me on the right track! I've run across open source libraries in the past that did have rare, hard to hit bugs. I'm relieved NumPy isn't one of them. At least not this time. Still, I think there's room to complain about vague, unhelpful error messages. Welcome to NumPy, I suppose.
So, to answer my own question: NumPy wasn't the problem. np.std()
is very stable. Hopefully my experiences here will help others rule out what isn't causing their code to collapse.