I have lists
A = [(i,j,k,l,m)]
B = [(l,m,k)]
and dictionaries
C = {(i,j,k,l,m): val}
D = {(l,m,k): other_val}
I would like to create a dictionary of E
such that
E = {(i,j,k): C[(i,j,k,l,m)]*D[(l,m,k)]}
Assume that all indexing convention matches in the lists and dictionaries. I have the below non-Pythonic, extremely slow solution. Is there any Pythonic way to quickly do this for very large A
sizes, e.g., 5 million rows?
E = {}
for i,j,k,l,m in A:
E[i,j,k] = sum(
C[i,j,k,l,m] * D[l2,m2,k2]
for l2,m2,k2 in B if l2==l and m2==m and k2==k)
Below is the code to generate a sample dataset that is near the actual size trying to be dealt with.
import numpy as np
np.random.seed(1)
Irange = range(50)
Jrange = range(10)
Krange = range(80)
Lrange = range(8)
Mrange = range(18)
A = [
(i,j,k,l,m)
for i in Irange
for j in Jrange
for k in Krange
for l in Lrange
for m in Mrange]
B = [
(l,m,k)
for k in Krange
for l in Lrange
for m in Mrange]
C = {key: np.random.uniform(1,10) for key in A}
D = {key: np.random.uniform(0,1) for key in B}
E = {}
for i,j,k,l,m in A:
E[i,j,k] = sum(
C[i,j,k,l,m] * D[l2,m2,k2]
for l2,m2,k2 in B if l2==l and m2==m and k2==k)
I am posting my fast enough solution. If you still see an improvement possibility, I am happy to test it out. (I was expecting some libraries to already have a faster solution; maybe, there is, but my question/solution approach was not clear enough to utilize one).
Here is the data generation code:
import numpy as np
from datetime import datetime
np.random.seed(1)
Irange = range(50)
Jrange = range(10)
Krange = range(80)
Lrange = range(8)
Mrange = range(18)
A = [
(i,j,k,l,m)
for i in Irange
for j in Jrange
for k in Krange
for l in Lrange
for m in Mrange]
B = [
(l,m,k)
for k in Krange
for l in Lrange
for m in Mrange]
C = {key: np.random.uniform(1,10) for key in A}
D = {key: np.random.uniform(0,1) for key in B}
First, start timer and introduce a list unique_ijk
:
start_timer = datetime.now() #Start counting time
unique_ijk = list(set([(i,j,k) for i,j,k,l,m in A]))
Then, create a dictionary called lm_given_ijk
that is valued with a list of l,m indices corresponding to a given i,j,k tuple key.
lm_given_ijk = {(i,j,k):[] for i,j,k in unique_ijk}
for i,j,k,l,m in A:
lm_given_ijk[i,j,k].append((l,m))
Finally, use lm_given_ijk
as follows to create E
.
E = {(i,j,k): sum(C[i,j,k,l,m]*D[l,m,k] for l,m in lm_given_ijk[i,j,k])
for i,j,k in unique_ijk}
print("Elapsed time is %s seconds.\n"%(datetime.now()-start_timer).total_seconds())
Output:
Elapsed time is 6.446798 seconds.
Writing all these, I agree with comments saying this is a numpy array thing. It could improve the speed, but I am happy with 6.4 seconds.