Search code examples
pythonarraysmemoryout-of-memorybigdata

numpy.memmap max array size on x32 machine?


I'm using python x32 on x32 win xp

sometimes program fails on line

fp = np.memmap('C:/memmap_test', dtype='float32', mode='w+', shape=(rows,cols))

error in memmap.py

Traceback (most recent call last):
    fp = np.memmap('C:/memmap_test', dtype='float32', mode='w+', shape=(rows,cols))   File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 253, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OverflowError: cannot fit 'long' into an index-sized integer

so I assume that there is limitation on size of the array, so what is the max size of array maxN = rows*cols?

Also the same qeuestion for 1. python x32 win x64 and 2. python x64 win x64.

UPDATE:

#create array
rows= 250000
cols= 1000
fA= np.memmap('A.npy', dtype='float32', mode='w+', shape=(rows,cols))
# fA1= np.memmap('A1.npy', dtype='float32', mode='w+', shape=(rows,cols)) # can't create another one big memmap
print fA.nbytes/1024/1024 # 953 mb

so it seems there is another limitations not only <2Gb for single memmaped array.

also output for test provided by @Paul

working with 30000000 elements
number bytes required 0.240000 GB
works
working with 300000000 elements
number bytes required 2.400000 GB
OverflowError("cannot fit 'long' into an index-sized integer",)
working with 3000000000 elements
number bytes required 24.000000 GB
IOError(28, 'No space left on device')
working with 30000000000 elements
number bytes required 240.000000 GB
IOError(28, 'No space left on device')
working with 300000000000 elements
number bytes required 2400.000000 GB
IOError(28, 'No space left on device')
working with 3000000000000 elements
number bytes required 24000.000000 GB
IOError(22, 'Invalid argument')

Solution

  • Here is some discussion on this topic: How big can a memory-mapped file be? and Why doesn't Python's mmap work with large files?

    For the below tests I am using the following code:

    baseNumber = 3000000L
    
    for powers in arange(1,7):
      l1 = baseNumber*10**powers
      print('working with %d elements'%(l1))
      print('number bytes required %f GB'%(l1*8/1e9))
      try:
        fp = numpy.memmap('test.map',dtype='float64', mode='w+',shape=(1,l1))
        #works 
        print('works')
        del fp
      except Exception as e:
        print(repr(e))
    

    python x32 on windows x32 With 32 bit windows, the file size is the limitation of about 2-3GB. So anything larger than this file size windows cannot create due to OS limitations. I didn't have access to x32 bit machine, but commands will fail after the file size limitation is hit

    python x32 on windows x64

    In this case it appears that since python is 32bit we cannot reach the file size allowed on win64.

    %run -i scratch.py
    
    python x32 win x64
    working with 30000000 elements
    number bytes required 0.240000 GB
    works
    working with 300000000 elements
    number bytes required 2.400000 GB
    OverflowError("cannot fit 'long' into an index-sized integer",)
    working with 3000000000 elements
    number bytes required 24.000000 GB
    OverflowError("cannot fit 'long' into an index-sized integer",)
    working with 30000000000 elements
    number bytes required 240.000000 GB
    IOError(28, 'No space left on device')
    working with 300000000000 elements
    number bytes required 2400.000000 GB
    IOError(28, 'No space left on device')
    working with 3000000000000 elements
    number bytes required 24000.000000 GB
    IOError(22, 'Invalid argument')
    

    python x64 on windows x64

    in this case we are limited by disk size initially, but then it seems by some overflow once our array/byte size is large enough

    %run -i scratch.py
    working with 30000000 elements
    number bytes required 0.240000 GB
    works
    working with 300000000 elements
    number bytes required 2.400000 GB
    works
    working with 3000000000 elements
    number bytes required 24.000000 GB
    works
    working with 30000000000 elements
    number bytes required 240.000000 GB
    IOError(28, 'No space left on device')
    working with 300000000000 elements
    number bytes required 2400.000000 GB
    IOError(28, 'No space left on device')
    working with 3000000000000 elements
    number bytes required 24000.000000 GB
    IOError(22, 'Invalid argument')
    

    In summary: The precise points your arrays will fail will depend on disk size initially for windows x64

    pythonx32 windows x64 Initially we have the type errors that you were seeing, then disk size limitations, but at some point invalid argument errors will be raised

    pythonx64 windows x64 Initially we have disk size limitations, but at some point other errors will be raised.
    Interestingly these errors do not appear related to 2^64 size issues as 3000000000000*8 < 2^64 in the same way that these errors manifested themselves on win32.

    It might be if the disk was big enough then we would not see the invalid argument errors and we could reach the 2^64 limits though I did not have a disk big enough to test this :)