Search code examples
pythonmatrixnumpyflags

Flags in Python


I'm working with a large matrix (250x250x30 = 1,875,000 cells), and I'd like a way to set an arbitrary number of flags for each cell in this matrix, in some manner that's easy to use and reasonably space efficient.

My original plan was a 250x250x30 list array, where each element was something like: ["FLAG1","FLAG8","FLAG12"]. I then changed it to storing just integers instead: [1,8,12]. These integers are mapped internally by getter/setter functions to the original flag strings. This only uses 250mb with 8 flags per point, which is fine in terms of memory.

My question is: am I missing another obvious way to structure this sort of data?

Thanks all for your suggestions. I ended up rolling a few suggestions into one, sadly I can only pick one answer and have to live with upvoting the others:

EDIT: erm the initial code I had here (using sets as the base element of a 3d numpy array) used A LOT of memory. This new version uses around 500mb when filled with randint(0,2**1000).

import numpy

FLAG1=2**0
FLAG2=2**1
FLAG3=2**2
FLAG4=2**3

(x,y,z) = (250,250,30)

array = numpy.zeros((x,y,z), dtype=object)


def setFlag(location,flag):
    array[location] |= flag
def unsetFlag(location,flag):
    array[location] &= ~flag

Solution

  • I would generally use a numpy array (presumably of short ints, 2 bytes each, since you may need more than 256 distinct values) -- that would take less than 4MB for the <2 million cells.

    If for some reason I couldn't afford the numpy dependency (e.g on App Engine, which doesn't support numpy), I'd use the standard library array module - it only supports 1-dimensional arrays, but it's just as space-efficient as numpy for large homogeneous arrays, and the getter/setter routines you mention can perfectly well "linearize" a 3-items tuple that's your natural index into the single integer index into the 1-D array.

    In general, consider numpy (or array) any time you have large homogeneous, dense vectors or matrices of numbers -- Python built-in lists are highly wasteful of space in this use case (due to their generality which you're not using and don't need here!-), and saving memory indirectly translates to saving time too (better caching, fewer levels of indirection, etc, etc).