Search code examples
pythonnumpynumpy-memmap

Numpy memmap first rows random


I'm testing np.memmap out since I need to use it for a large datafile. I'm running python 3.7 on a Windows machine. My test example is very simple.

import numpy as np
arr = np.ones((10**4, 10), dtype=np.float32)
np.save("./arr_test.npy", arr)
data = np.memmap("./arr_test.npy", dtype=np.float32, shape=arr.shape) 
print((data!=1).sum(), data[:30])

The output shows that the first 32 are not equals to one.

(32, memmap([[2.2366853e+08, 1.2387478e-40, 3.4833497e-15, 4.4898648e+21,
          1.5767864e-19, 2.1442303e-07, 2.2228396e-15, 7.6830766e+31,
          1.7177136e+19, 6.7425655e+22],
         [1.5767864e-19, 1.8727951e+31, 2.2228527e-15, 2.7904159e+29,
          1.5767847e-19, 6.4098282e-10, 1.4584911e-19, 2.4043096e-12,
          1.3593928e-19, 1.3563156e-19],
         [1.3563156e-19, 1.3563156e-19, 1.3563156e-19, 1.3563156e-19,
          1.3563156e-19, 1.3563156e-19, 1.3563156e-19, 1.3563156e-19,
          1.3563156e-19, 1.3563156e-19],
         [1.3563156e-19, 7.7097618e-33, 1.0000000e+00, 1.0000000e+00,
          1.0000000e+00, 1.0000000e+00, 1.0000000e+00, 1.0000000e+00,
          1.0000000e+00, 1.0000000e+00],
         [1.0000000e+00, 1.0000000e+00, 1.0000000e+00, 1.0000000e+00,
          1.0000000e+00, 1.0000000e+00, 1.0000000e+00, 1.0000000e+00,
          1.0000000e+00, 1.0000000e+00],

What did I miss?


Solution

  • memmap works on raw data without extra information but files in NPY-format have a header with information about datatype, dimensions and more.

    The non-zero numbers are the header-data interpreted as floats.

    The function numpy.lib.format.open_memmap is intended to memmap existing NPY-files or create a new one first.