I have an operation to apply in python to more than 10 millions values. My problem is to optimise the actual operation. I have 2 working methods, numpy and python vanilla.
byte
data: b'\x9a#\xe6\x00'
= [154, 35, 230, 0]
= [0x9A, 0x23, 0xE6, 0x00]
b'\x00\x9a#\xe6'
= [0, 154, 35, 230]
= [0x00, 0x9A, 0x23, 0xE6]
-433874432
File loading:
f = open(path_data, "rb")
while trame := f.read(4):
Data operation:
trame = b'\x9a#\xe6\x00'
trame_list = list(trame) # [154, 35, 230, 0]
trame_list_swap = [trame_list[-1]] + trame_list[:-1]
trame_swap = bytes(trame_list_swap)
result = int.from_bytes(trame_swap, byteorder='little', signed=True)
File loading:
datas_raw = numpy.fromfile(path_data, dtype="<i4")
# datas_raw = numpy.array([-1708923392, 1639068928, 2024603392, ...]) # len(datas_raw) = 12171264
for i, trame in enumerate(datas_raw):
Data operation:
trame = 15082394
tmp = list(trame.tobytes("C"))
tmp.insert(0, tmp.pop())
result = numpy.ndarray(1, "<i", bytes(tmp))[0]
It is doing the same processing than vanilla but slower here because of numpy.ndarray
that is triggered 10 millions times...
My question is the following:
I would like for numpy version to operate on all value the bitwise operation without for loop
(that are very slow in python)... Any other solution for the issue is welcome (not closed XY problem...)
Here I use some random data in place of data read from file, which you can do using np.loadtxt
. Ideally, you would read your bytes into a 1-d array with shape (4*n,) and then reshape to be (n,4)
.
import numpy as np
rng = np.random.default_rng(0)
data = rng.integers(-2**31,2**31,size=10000,dtype="i4")
data = data.view("u1").reshape((-1,4))
# Last column first, then other 3
data = data[:,[3,0,1,2]]
# Depending on platform might need to specify byteorder, e.g., "<i4" or ">i4"
ints = np.ascontiguousarray(data).view("i4")
This produces values like
array([[-1031643175],
[ 267112355],
[ -640212606],
...,
This returns an array with shape (n,1) of signed integers.