Search code examples
pythonperformancenumpyscientific-computing

How to find Bragg reflexes fast with numpy?


For x-ray diffraction one needs to find the solutions to the so called Laue-Equation

G_hkl - k_in + |k_in|*(sin(theta) cos(phi) , sin(theta) sin(phi) , cos(theta))=0

where G_hkl is a given 3 dimensional vector, k_in can be chosen as (0,0,1) and theta and phi are free parameters to fulfill the equation. In a typical experiment G_hkl is then rotated around the x-axis and for every step in the rotation one needs to find the solution to the equation. The equation cannot have more than one solution at a given rotation.

I have written this python script to find these solution but for my application it is not fast enough.

import numpy as np
import time
# Just initialization of variables. This is fast enough
res_list = []
N_rot = 100
Ghkl = np.array([0,0.7,0.7])
Ghkl_norm = np.linalg.norm(Ghkl)
kin = np.array([0,0,1])
kin_norm = np.linalg.norm(kin)
alpha_step = 2*np.pi/(N_rot-1)
rot_mat = np.array([[1,0,0],[0,np.cos(alpha_step),-np.sin(alpha_step)],[0,np.sin(alpha_step),np.cos(alpha_step)]])

# You can deduce theta from the norm of the vector equation
theta = np.arccos(1-(Ghkl_norm**2/(2*kin_norm**2))) 
sint = np.sin(theta)
cost = np.cos(theta)

# This leaves only phi as paramter to find
# I find phi by introducing a finite test vector
# and testing whether the norm of the vector equation is close
# to zero for any of those test phis
phi_test = np.linspace(0,2*np.pi,200)

kout = kin_norm * np.array([sint * np.cos(phi_test), sint * np.sin(phi_test), cost + 0*phi_test]).T
##############
start_time = time.time()

for j in range(100): # just to make it longer to measure the time
    res_list = []
    for i in range(N_rot): # This loop is too slow
        # Here the norm of the vector equation is calculated for all phi_test
        norm_vec = np.linalg.norm(Ghkl[np.newaxis, :] - kin[np.newaxis, :] + kout, axis=1)
        if (norm_vec < 0.01 * kin_norm).any(): # check whether any fulfills the criterion
            minarg = np.argmin(norm_vec)
            res_list.append([theta, phi_test[minarg]])
        Ghkl = np.dot(rot_mat,Ghkl)

print('Time was {0:1.2f} s'.format( (time.time()-start_time)))
# On my machine it takes 0.3s and res_list should be 
# [[1.0356115365192968, 1.578689775673263]]

Do you know a faster way to calculate this, either conceptually by solving the equation totally different or by just making it faster with my existing method?


Solution

  • There's a dependency as Ghkl is being updated at each iteration and re-used at the next. The corresponding closed form might be difficult to trace out. So, I would focus on improving the performance on the rest of the code inside that innermost loop.

    Now, there we are calculating norm_vec, which I think could be sped up with two methods as listed next.

    Appproach #1 Using Scipy's cdist -

    from scipy.spatial.distance import cdist
    
    norm_vec = cdist(kout,(kin-Ghkl)[None]) # Rest of code stays the same
    

    Appproach #2 Using np.einsum -

    sums = (Ghkl-kin)+kout
    norm_vec_sq = np.einsum('ij,ij->i',sums,sums)
    if (norm_vec_sq < (0.01 * kin_norm)**2 ).any():
        minarg = np.argmin(norm_vec_sq) # Rest of code stays the same
    

    Runtime test -

    Using the inputs listed in the question, we have these results :

    In [91]: %timeit np.linalg.norm(Ghkl- kin + kout, axis=1)
    10000 loops, best of 3: 31.1 µs per loop
    
    In [92]: sums = (Ghkl-kin)+kout
        ...: norm_vec_sq = np.einsum('ij,ij->i',sums,sums)
        ...: 
    
    In [93]: %timeit (Ghkl-kin)+kout                 # Approach2 - step1
    100000 loops, best of 3: 7.09 µs per loop
    
    In [94]: %timeit np.einsum('ij,ij->i',sums,sums) # Approach2 - step2
    100000 loops, best of 3: 3.82 µs per loop
    
    In [95]: %timeit cdist(kout,(kin-Ghkl)[None])    # Approach1
    10000 loops, best of 3: 44.1 µs per loop
    

    So, approach #1 isn't showing any improvement for those input sizes, but approach #2 is getting around 3x speedup there on calculating norm_vec.

    Short note on why np.einsum is beneficial here : Well, einsum is great for elementwise multiplication and sum-reduction. We are exploiting that very nature of problem here. np.linalg.norm gives us summations along the second axis on the squared version of the input. So, the einsum counterpart would be to just feed the same input twice as the two inputs to it, thus taking care of the squaring and then lose the second axis with its sum-reduction that is expressed with its string notation. It does both these in one go and that's probably why it's so fast here.