Search code examples
pythonnumpybooleanmmapnumpy-memmap

Lazy version of numpy.unpackbits


I use numpy.memmap to load only the parts of arrays into memory that I need, instead of loading an entire huge array. I would like to do the same with bool arrays.

Unfortunately, bool memmap arrays aren't stored economically: according to ls, a bool memmap file requires as much space as a uint8 memmap file of the same array shape.

So I use numpy.unpackbits to save space. Unfortunately, it seems not lazy: It's slow and can cause a MemoryError, so apparently it loads the array from disk into memory instead of providing a "bool view" on the uint8 array.

So if I want to load only certain entries of the bool array from file, I first have to compute which uint8 entries they are part of, then apply numpy.unpackbits to that, and then again index into that.

Isn't there a lazy way to get a "bool view" on the bit-packed memmap file?


Solution

  • Not possible. The memory layout of a bit-packed array is incompatible with what you're looking for. The NumPy shape-and-strides model of array layout does not have sub-byte resolution. Even if you were to create a class that emulated the view you want, trying to use it with normal NumPy operations would require materializing a representation NumPy can work with, at which point you'd have to spend the memory you don't want to spend.