Dask
shows slightly smaller size than the actual size of a numpy array. Here is an example of a numpy
array that is exactly 32 Mb:
import dask as da
import dask.array
import numpy as np
shape = (1000,4000)
ones_np = np.ones(shape)
print(f"Size:{ones_np.nbytes / 1e6} Mb")
>> Size: 32.0 Mb
However with Dask it shows 30.52:
ones_da = da.array.ones(shape)
ones_da
Tho if I do ones_da.nbytes/1e6
it returns the correct (32 Mb) size.
I thought dask Array size should show the actual size?
The function responsible is here in dask/utils (permalink) and it only supports powers of 2, not 10. This in contrast to the time units immediately below. You could ask for this to be a configurable thing, but someone would have to put in a little work.