I use numpy.memmap
to load only the parts of arrays into memory that I need, instead of loading an entire huge array. I would like to do the same with bool
arrays.
Unfortunately, bool
memmap arrays aren't stored economically: according to ls
, a bool
memmap file requires as much space as a uint8
memmap file of the same array shape.
So I use numpy.unpackbits
to save space. Unfortunately, it seems not lazy: It's slow and can cause a MemoryError
, so apparently it loads the array from disk into memory instead of providing a "bool
view" on the uint8
array.
So if I want to load only certain entries of the bool
array from file, I first have to compute which uint8
entries they are part of, then apply numpy.unpackbits
to that, and then again index into that.
Isn't there a lazy way to get a "bool
view" on the bit-packed memmap file?
Not possible. The memory layout of a bit-packed array is incompatible with what you're looking for. The NumPy shape-and-strides model of array layout does not have sub-byte resolution. Even if you were to create a class that emulated the view you want, trying to use it with normal NumPy operations would require materializing a representation NumPy can work with, at which point you'd have to spend the memory you don't want to spend.