Search code examples
pythonnumpymetadatastride

numpy array metadata change


I know that numpy stores numbers in contiguous memory. So is it possible to take

a = np.array([127,127,127,127,127,127,127,127], dtype=np.uint8)

the binary representation of 'a' is all ones

to this:

b = np.array([72057594037927935], dtype=np.uint64)

as well as back again from b->a.

The binary representation is all ones however the number of elements is combined to one single 64 bit int the representation should be the same in Numpy only the metadata should change.

This sounds like a job for stride tricks but my best guess is:

np.lib.stride_tricks.as_strided(a, shape=(1,), strides=(8,8))

and

np.lib.stride_tricks.as_strided(b, shape=(8,), strides=(1,8))

only to get ValueError: mismatch in length of strides and shape

This only needs to be read only so I have no delusions thinking that I need to change the data.


Solution

  • If you want to reinterpret the existing data in an array you need numpy.ndarray.view. That's the main difference between .astype and .view (i.e. the former converts to a new type with the values being preserved, while the latter maintains the same memory and changes how it's interpreted):

    import numpy as np 
    
    a = np.array([127,127,127,127,127,127,127,127], dtype=np.uint8)
    b = a.view(np.uint64) 
    print(a) 
    print(b) 
    print(b.view(np.uint8))                                        
    

    This outputs

    [127 127 127 127 127 127 127 127]
    [9187201950435737471]
    [127 127 127 127 127 127 127 127]
    

    Note that 127 has a leading zero in its binary pattern, so it's not all ones, which is why the value we get in b is different from what you expect:

    >>> bin(b[0])
    '0b111111101111111011111110111111101111111011111110111111101111111'
    
    >>> bin(72057594037927935)
    '0b11111111111111111111111111111111111111111111111111111111'
    

    What you seem to assume is a set of uint7 values of one bits...

    Anyway, the best part about .view is that the exact same block of memory will be used unless you explicitly copy:

    >>> b.base is a
    True
    

    The corollary, of course, is that mutating b will affect a:

    >>> b += 3
    
    >>> a
    array([130, 127, 127, 127, 127, 127, 127, 127], dtype=uint8)
    

    To control endianness you'd want to use string-valued dtype specifications, i.e. a.view('<u8') (little endian) or a.view('>u8') (big endian). We can use this to reproduce the faulty number in your question:

    >>> a2 = np.array([0] + [255] * 7, dtype=np.uint8)
    ... a2.view('>u8')
    array([72057594037927935], dtype=uint64)