Search code examples
pythonpandasnumpyxor

How do I update a pandas/numpy row with a xor of a next row into that same row


Ok, the question is if there is a fast way with pandas or numpy to xor an array and update the next row with the results.

Basically I have a pandas data frame named 'ss' like so:

    rst  no1  no2  no3  no4  no5  no6  no7
0     1    6    2   15   14    9    5    1
1    11    0    0    0    0    0    0    0
2     9    0    0    0    0    0    0    0
3    11    0    0    0    0    0    0    0
4     3    0    0    0    0    0    0    0
5    15    0    0    0    0    0    0    0
6     0    0    0    0    0    0    0    0


Use: ss = pd.read_clipboard()
to copy paste the dataframe into a variable use the above command

What I want to do is to update each 'no' column with a xor from the next 'rst' column such that each no row in is equal to ss.loc[1:, ['no1', 'no2', 'etc']) = [ss.loc[1, ('rst')] ^ ss.loc[0, [0, ['no1', 'no2', 'etc']) or something like that so the first step would create a dataframe like this:

    rst  no1  no2  no3  no4  no5  no6  no7
0     1    6    2   15   14    9    5    1
1    11   13    9    4    5    2   14   10
2     9    0    0    0    0    0    0    0
3    11    0    0    0    0    0    0    0
4     3    0    0    0    0    0    0    0
5    15    0    0    0    0    0    0    0
6     0    0    0    0    0    0    0    0

which is basically ss.loc[1, ('rst')] which is 11 so 11 ^ np.array([ 6, 2, 15, 14, 9, 5, 1]) which the result is np.array([13, 9, 4, 5, 2, 14, 10]) which then I set to each no column in sequence as you can see above.

and the next step is to take ss.loc[2, ('rst')] which is 9 and do the next sequence:

    rst  no1  no2  no3  no4  no5  no6  no7
0     1    6    2   15   14    9    5    1
1    11   13    9    4    5    2   14   10
2     9    4    0   13   12   11    7    3
3    11    0    0    0    0    0    0    0
4     3    0    0    0    0    0    0    0
5    15    0    0    0    0    0    0    0
6     0    0    0    0    0    0    0    0

so 9 ^ np.array([13, 9, 4, 5, 2, 14, 10]) which the result is np.array([4, 0, 13, 12, 11 , 7, 3]) which then I set in each no column in sequence as you can see above.

My question is how do I do this with numpy or pandas in a fast/quick way, and can I do the without the use of any loops as I'm working with a data set of one million and looping is slow so I'm hoping there is a shortcut or better method of setting each 'no*' column with the xor of the next 'rst' row to the corresponding 'no' column in the same row as the 'rst' column.


Solution

  • IIUC, you can use numpy.bitwise_xor, once in its accumulate variant on rst, then combined to the no columns:

    rst = ss['rst'].to_numpy(copy=True)[:,None]
    rst[0] = 0
    no = ss.filter(like='no').iloc[0].to_numpy()
    
    x = np.bitwise_xor(np.bitwise_xor.accumulate(rst, axis=0), no)
    
    out = ss[['rst']].join(
           pd.DataFrame(x, index=ss.index, columns=list(ss.filter(like='no')))
          )
    

    This works because XOR is commutative and associative, so A^B^C equals (A^C)^B. Here we fist accumulate the XOR on rst to then apply it on the first row for each intermediate.

    Output:

       rst  no1  no2  no3  no4  no5  no6  no7
    0    1    6    2   15   14    9    5    1
    1   11   13    9    4    5    2   14   10
    2    9    4    0   13   12   11    7    3
    3   11   15   11    6    7    0   12    8
    4    3   12    8    5    4    3   15   11
    5   15    3    7   10   11   12    0    4
    6    0    3    7   10   11   12    0    4