Search code examples
pythonpython-3.xnumpybinaryfromfile

Two implementations of Numpy fromfile?


I am trying to update some legacy code that uses np.fromfile in a method. When I try searching the numpy source for this method I only find np.core.records.fromfile, but when you search the docs you can find np.fromfile. Taking a look at these two methods you can see they have different kwargs which makes me feel like they are different methods altogether.

My questions are:

1) Where is the source for np.fromfile located?

2) Why are there two different functions under the same name? This can clearly get confusing if you aren't careful about the difference as the two behave differently. Specifically np.core.records.fromfile will raise errors if you try to read more bytes than a file contains while np.fromfile does not. You can find a minimal example below.

In [1]: import numpy as np

In [2]: my_bytes = b'\x04\x00\x00\x00\xac\x92\x01\x00\xb2\x91\x01'

In [3]: with open('test_file.itf', 'wb') as f:
            f.write(my_bytes)

In [4]: with open('test_file.itf', 'rb') as f:
            result = np.fromfile(f, 'int32', 5)

In [5]: result
Out [5]: 

In [6]: with open('test_file.itf', 'rb') as f:
            result = np.core.records.fromfile(f, 'int32', 5)
ValueError: Not enough bytes left in file for specified shape and type

Solution

  • If you use help on np.fromfile you will find something very... helpful:

    Help on built-in function fromfile in module numpy.core.multiarray:
    
    fromfile(...)
        fromfile(file, dtype=float, count=-1, sep='')
    
        Construct an array from data in a text or binary file.
    
        A highly efficient way of reading binary data with a known data-type,
        as well as parsing simply formatted text files.  Data written using the
        `tofile` method can be read using this function.
    

    As far as I can tell, this is implemented in C and can be found here.

    If you are trying to save and load binary data, you shouldn't use np.fromfile anymore. You should use np.save and np.load which will use a platform-independent binary format.