For x-ray diffraction one needs to find the solutions to the so called Laue-Equation
G_hkl - k_in + |k_in|*(sin(theta) cos(phi) , sin(theta) sin(phi) , cos(theta))=0
where G_hkl is a given 3 dimensional vector, k_in can be chosen as (0,0,1) and theta and phi are free parameters to fulfill the equation. In a typical experiment G_hkl is then rotated around the x-axis and for every step in the rotation one needs to find the solution to the equation. The equation cannot have more than one solution at a given rotation.
I have written this python script to find these solution but for my application it is not fast enough.
import numpy as np
import time
# Just initialization of variables. This is fast enough
res_list = []
N_rot = 100
Ghkl = np.array([0,0.7,0.7])
Ghkl_norm = np.linalg.norm(Ghkl)
kin = np.array([0,0,1])
kin_norm = np.linalg.norm(kin)
alpha_step = 2*np.pi/(N_rot-1)
rot_mat = np.array([[1,0,0],[0,np.cos(alpha_step),-np.sin(alpha_step)],[0,np.sin(alpha_step),np.cos(alpha_step)]])
# You can deduce theta from the norm of the vector equation
theta = np.arccos(1-(Ghkl_norm**2/(2*kin_norm**2)))
sint = np.sin(theta)
cost = np.cos(theta)
# This leaves only phi as paramter to find
# I find phi by introducing a finite test vector
# and testing whether the norm of the vector equation is close
# to zero for any of those test phis
phi_test = np.linspace(0,2*np.pi,200)
kout = kin_norm * np.array([sint * np.cos(phi_test), sint * np.sin(phi_test), cost + 0*phi_test]).T
##############
start_time = time.time()
for j in range(100): # just to make it longer to measure the time
res_list = []
for i in range(N_rot): # This loop is too slow
# Here the norm of the vector equation is calculated for all phi_test
norm_vec = np.linalg.norm(Ghkl[np.newaxis, :] - kin[np.newaxis, :] + kout, axis=1)
if (norm_vec < 0.01 * kin_norm).any(): # check whether any fulfills the criterion
minarg = np.argmin(norm_vec)
res_list.append([theta, phi_test[minarg]])
Ghkl = np.dot(rot_mat,Ghkl)
print('Time was {0:1.2f} s'.format( (time.time()-start_time)))
# On my machine it takes 0.3s and res_list should be
# [[1.0356115365192968, 1.578689775673263]]
Do you know a faster way to calculate this, either conceptually by solving the equation totally different or by just making it faster with my existing method?
There's a dependency as Ghkl
is being updated at each iteration and re-used at the next. The corresponding closed form might be difficult to trace out. So, I would focus on improving the performance on the rest of the code inside that innermost loop.
Now, there we are calculating norm_vec
, which I think could be sped up with two methods as listed next.
Appproach #1 Using Scipy's cdist
-
from scipy.spatial.distance import cdist
norm_vec = cdist(kout,(kin-Ghkl)[None]) # Rest of code stays the same
Appproach #2 Using np.einsum
-
sums = (Ghkl-kin)+kout
norm_vec_sq = np.einsum('ij,ij->i',sums,sums)
if (norm_vec_sq < (0.01 * kin_norm)**2 ).any():
minarg = np.argmin(norm_vec_sq) # Rest of code stays the same
Runtime test -
Using the inputs listed in the question, we have these results :
In [91]: %timeit np.linalg.norm(Ghkl- kin + kout, axis=1)
10000 loops, best of 3: 31.1 µs per loop
In [92]: sums = (Ghkl-kin)+kout
...: norm_vec_sq = np.einsum('ij,ij->i',sums,sums)
...:
In [93]: %timeit (Ghkl-kin)+kout # Approach2 - step1
100000 loops, best of 3: 7.09 µs per loop
In [94]: %timeit np.einsum('ij,ij->i',sums,sums) # Approach2 - step2
100000 loops, best of 3: 3.82 µs per loop
In [95]: %timeit cdist(kout,(kin-Ghkl)[None]) # Approach1
10000 loops, best of 3: 44.1 µs per loop
So, approach #1 isn't showing any improvement for those input sizes, but approach #2 is getting around 3x
speedup there on calculating norm_vec
.
Short note on why np.einsum
is beneficial here : Well, einsum
is great for elementwise multiplication and sum-reduction. We are exploiting that very nature of problem here. np.linalg.norm
gives us summations along the second axis on the squared version of the input. So, the einsum
counterpart would be to just feed the same input twice as the two inputs to it, thus taking care of the squaring and then lose the second axis with its sum-reduction that is expressed with its string notation. It does both these in one go and that's probably why it's so fast here.