Search code examples
pythonnumpynumpy-ndarraynumpy-memmap

Is there a way to load a numpy unicode array into a memmap?


I am trying to create an array of dtype='U' and saving that using numpy.save(), however, when trying to load the saved file into a numpy.memmap I get an error related to the size not being a multiple of 'U3'

I am working with python 3.5.2. I have tried the following code where I am creating an empty array and another array with 3 entries, all with length of 3 letters and then save the array into file1.npy file.

import numpy as np
arr = np.empty((1, 0), dtype='U')
arr2 = np.array(['111', '222', '333'], dtype='U')
arr = np.concatenate((arr, arr2), axis = None)
print(arr)
np.save('file1', arr)

rArr = np.memmap('file1.npy', dtype='U3', mode='r')

However, when I try to load the file into a numpy.memmap I get the the following error ValueError: Size of available data is not a multiple of the data-type size.

Is there a way to load the data into a numpy.memmap using strings? I feel I am missing something simple.


Solution

  • The files used by numpy.memmap are raw binary files, not NPY-format files. If you want to read a memory-mapped NPY file, use numpy.load with the argument mmap_mode='r' (or whatever other value is appropriate).

    After creating 'file1.npy' like you did, here's how it can be memory-mapped with numpy.load:

    In [16]: a = np.load('file1.npy', mmap_mode='r')                                                                       
    
    In [17]: a                                                                                                             
    Out[17]: memmap(['111', '222', '333'], dtype='<U3')