I initialize two operands and one result:
a = np.memmap('a.mem', mode='w+', dtype=np.int64, shape=(2*1024*1024*1024))
b = np.memmap('b.mem', mode='w+', dtype=np.int64, shape=(2*1024*1024*1024))
result = np.memmap('result.mem', mode='w+', dtype=np.int64, shape=(2*1024*1024*1024))
At idle state like that, the System RAM reported by Google Colab still 1.0/12.7 GB
which is good there is no RAM activities yet. But, doing this vector operation such as vector substraction, the reported system ram increased to the almost maximum peak which is 11.2/12.7 GB
that eventually the runtime kernel is crashed:
result[:] = a[:] - b[:] # This still consume memory
result.flush()
I have read np.memmap
docs many times, it was asserted that the purpose of memmap
is supposed to reduce memory consumption, but why I still got Out Of Memory
error?
I suspect, the vector subtraction must be buffered into small chunk such as for every 512MB
buffer memory. But I have no idea what the syntax is:
Perhaps what I mean is something like this:
BUFF_SIZE = 512 * 1024 * 1024
for i in range(0, result.size, BUFF_SIZE):
result[i:i+BUFF_SIZE] = a[i:i+BUFF_SIZE] - b[i:i+BUFF_SIZE]
result.flush()
result[:] = a[:] - b[:]
doesn't mean "write the subtraction results directly into result
". It means "write the subtraction results into a new array, then copy the contents of that array into result
". You're attempting to create a 16 GiB temporary array in the middle.
To write the output directly into result
, you can use the out
parameter of the numpy.subtract
ufunc:
numpy.subtract(a, b, out=c)