Search code examples
numpycharinteger

Make numpy array which is hash of string array


I have numpy array:

A = np.array(['abcd','bcde','cdef'])

I need hash array of A: with function

B[i] = ord(A[i][1]) * 256 + ord(A[i][2])

B = np.array([ord('b') * 256 + ord('c'), ord('c') * 256 + ord('d'), ord('d') * 256 + ord('e')])

How I can do it?


Solution

  • Based on the question, I assume the string are ASCII one and all strings have a size bigger than 3 characters.

    You can start by converting strings to ASCII one for sake of performance and simplicity (by creating a new temporary array). Then you can merge all the string in one big array without any copy thanks to views (since Numpy strings are contiguously stored in memory) and you can actually convert characters to integers at the same time (still without any copy). Then you can use the stride so to compute all the hash in a vectorized way. Here is how:

    ascii = A.astype('S')
    buff = ascii.view(np.uint8)
    result = buff[1::ascii.itemsize]*256 + buff[2::ascii.itemsize]