Search code examples
pythonpandasnumpybitwise-operators

Major Speedup Question for a for loop in Pandas/Numpy on a bitwise_xor accumulate


Ok i am using a for loop as show below to convert this data from to the one below using xor accumulate. For the entries i have (830401) rows and this is very very slow. is there any way to speed up this kind of accumulate in pandas or using numpy and then assiging it back the numpy array itself



In [122]: acctable[0:20]
Out[122]: 
    what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0      4    2   10    8    0    5    7    1   13   11
1      4    0    0    0    0    0    0    0    0    0
2      6    0    0    0    0    0    0    0    0    0
3     14    0    0    0    0    0    0    0    0    0
4     12    0    0    0    0    0    0    0    8    0
5      4    0    0    0    0    0    0    0    0    0
6      1    0    0    0    0    0    0    0    0    0

...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
830477    15    0    0    0    0    0    0    0    0    0
830478     3    0    0    0    0    0    0    0    0    0
830479    11    0    0    0    0    0    0    0    0    0
830480     9    0    0    0    0    0    0    0    0    0
830481    11    0    0    0    0    0    0    0    0    0

[830482 rows x 10 columns]

Here is what i tried and it literally can take a full minute and i have larger data sets to work with so any shortcuts or best methods would trull be helpful:

# Update: Instead of all 800k of 'what', i put the first 5 numbers in rstr so you can see how i'm xor accumulating. You should be able to copy/paste the first 6 elements of the data from with pd.read_clipboard() and assign to acctable. 

In [121]: rstr
Out[121]: array([ 4,  4, 12, 14,  6,  4], dtype=int8)
  
dt = np.int8
rstr = np.array(acctable.loc[:5, ('what')], dtype=dt)
for x in range(4): # # Prime Sequencing Functions
   wuttr = np.bitwise_xor.accumulate(np.r_[[rstr[-(x+1)]], acctable.loc[x, 'what':]], dtype=dt)
   acctable.loc[x+1, "what":] = wuttr[:end]

After:


In [122]: acctable[0:20]
Out[122]: 
    what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0      4    2   10    8    0    5    7    1   13   11
1      4    0    2    8    0    0    5    2    3   14
2      6    2    2    0    8    8    8   13   15   12
3     14    8   10    8    8    0    8    0   13    2
4     12    2   10    0    8    0    0    8    8    5
5      4    8   10    0    0    8    8    8    0    8
6      1    5   13    7    7    7   15    7   15   15
...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
830477    15   15    7    0    0    5    9   14   10    3
830478     3   12    3    4    4    4    1    8    6   12
830479    11    8    4    7    3    7    3    2   10   12
830480     9    2   10   14    9   10   13   14   12    6
830481    11    2    0   10    4   13    7   10    4    8

[830482 rows x 10 columns]

It's a simple accumulate but you need to have the previous row to continue the accumulation and the only way i could of is using the for loop. Also "rstr" variable is actually the "what" column.

Thanks!

I received this result from an ai but it only works on the first rows:

what_arr = acctable['what'].to_numpy().reshape(-1)  # Reshape to ensure 1D array

# Modified XOR accumulation:
all_what_arr = np.concatenate([[what_arr[0]], what_arr[1:]])
cumulative_xor = np.bitwise_xor.accumulate(all_what_arr)
shifted_xor = cumulative_xor[1:].reshape(-1, 1)
acctable.iloc[1:, 1:] = shifted_xor ^ acctable.iloc[1:, 1:]


In [171]: acctable
Out[171]: 
        what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
0          4    2   10    8    0    5    7    1   13   11
1          6    0    2    8    0    0    5    2    3   14
2         14    4    6    6   12   14   12   11   11   10
3         12    2   10    0   10   10    8    8   15    8
4          4   12   14    4    4   14    4   12    4   11

Here are the timeit values as you can see Andrej's modification and njit use were a huge factor of speedup!

In [262]:  import timeit
     ...: 
     ...:  setup = """
     ...:  import numpy as np
     ...:  import pandas as pd
     ...:  from numba import njit
     ...:  
     ...:  
     ...:  def do_work_no_njit(df):
     ...:      dt = np.int8
     ...:      end = -1
     ...:      rstr = np.array(df.loc[:, 0], dtype=dt)
     ...:      for x in range(len(df)):
     ...:          wuttr = np.bitwise_xor.accumulate(np.r_[[rstr[-(x+1)]], df.loc[x, 0:]], dtype=dt)
     ...:          df.loc[x+1, 0:] = wuttr[:end]
     ...:  
     ...:  @njit
     ...:  def do_work(vals):
     ...:      for row in range(vals.shape[0] - 1):
     ...:          for i in range(vals.shape[1] - 1):
     ...:              vals[row + 1, i + 1] = vals[row, i] ^ vals[row + 1, i]
     ...:  
     ...:  # Replace with your DataFrame creation code
     ...:  df = pd.DataFrame(np.random.randint(0, 15, size=(1000000, 10)), dtype=np.int8) # Example DataFrame, dtype=np.int8) # Example DataFrame
     ...:  """
     ...: 
     ...:  stmt = """
     ...:  do_work(df.values)
     ...:  """
     ...: 
     ...:  stmtnonjit = """
     ...:  do_work_no_njit(df.copy())
     ...:  """
     ...: 
     ...:  number = 1  # Adjust the number of repetitions as needed
     ...: 
     ...:  time = timeit.timeit(stmtnonjit, setup, number=number)
     ...:  print(f"Average time per execution no njit: {time / number:.4f} seconds")
     ...: 
     ...:  time = timeit.timeit(stmt, setup, number=number)
     ...:  print(f"Average time per execution with njit and optimized code by Andrej: {time / number:.4f} seconds")
     ...: 
Average time per execution no njit: 73.3801 seconds
Average time per execution with njit and optimized code by Andrej: 0.0442 seconds


Solution

  • You can try to speed up the computation:

    from numba import njit
    
    
    @njit
    def do_work(vals):
        for row in range(vals.shape[0] - 1):
            for i in range(vals.shape[1] - 1):
                vals[row + 1, i + 1] = vals[row, i] ^ vals[row + 1, i]
    
    
    do_work(df.values)
    print(df)
    

    Prints:

       what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
    0     4    2   10    8    0    5    7    1   13   11
    1     4    0    2    8    0    0    5    2    3   14
    2     6    2    2    0    8    8    8   13   15   12
    3    14    8   10    8    8    0    8    0   13    2
    4    12    2   10    0    8    0    0    8    8    5
    

    Initial df:

       what  dx1  dx2  dx3  dx4  dx5  dx6  dx7  dx8  dx9
    0     4    2   10    8    0    5    7    1   13   11
    1     4    0    0    0    0    0    0    0    0    0
    2     6    0    0    0    0    0    0    0    0    0
    3    14    0    0    0    0    0    0    0    0    0
    4    12    0    0    0    0    0    0    0    0    0