Search code examples
pythonsum

Sum using the last two elements of a dictionary's tupled key


I have lists

A = [(i,j,k,l,m)]
B = [(l,m,k)]

and dictionaries

C = {(i,j,k,l,m): val}
D = {(l,m,k): other_val}

I would like to create a dictionary of E such that

E = {(i,j,k): C[(i,j,k,l,m)]*D[(l,m,k)]}

Assume that all indexing convention matches in the lists and dictionaries. I have the below non-Pythonic, extremely slow solution. Is there any Pythonic way to quickly do this for very large A sizes, e.g., 5 million rows?

E = {}
for i,j,k,l,m in A:
    E[i,j,k] = sum(
        C[i,j,k,l,m] * D[l2,m2,k2] 
        for l2,m2,k2 in B if l2==l and m2==m and k2==k)

Below is the code to generate a sample dataset that is near the actual size trying to be dealt with.

import numpy as np
np.random.seed(1)

Irange = range(50)
Jrange = range(10)
Krange = range(80)
Lrange = range(8)
Mrange = range(18)

A = [
    (i,j,k,l,m)
    for i in Irange
    for j in Jrange
    for k in Krange
    for l in Lrange
    for m in Mrange]
B = [
    (l,m,k)
    for k in Krange
    for l in Lrange
    for m in Mrange]

C = {key: np.random.uniform(1,10) for key in A}

D = {key: np.random.uniform(0,1) for key in B}

E = {}
for i,j,k,l,m in A:
    E[i,j,k] = sum(
        C[i,j,k,l,m] * D[l2,m2,k2]
        for l2,m2,k2 in B if l2==l and m2==m and k2==k)

Solution

  • I am posting my fast enough solution. If you still see an improvement possibility, I am happy to test it out. (I was expecting some libraries to already have a faster solution; maybe, there is, but my question/solution approach was not clear enough to utilize one).

    Here is the data generation code:

    import numpy as np
    from datetime import datetime
    np.random.seed(1)
    
    Irange = range(50)
    Jrange = range(10)
    Krange = range(80)
    Lrange = range(8)
    Mrange = range(18)
    
    A = [
        (i,j,k,l,m)
        for i in Irange
        for j in Jrange
        for k in Krange
        for l in Lrange
        for m in Mrange]
    B = [
        (l,m,k)
        for k in Krange
        for l in Lrange
        for m in Mrange]
    
    C = {key: np.random.uniform(1,10) for key in A}
    
    D = {key: np.random.uniform(0,1) for key in B}
    

    First, start timer and introduce a list unique_ijk:

    start_timer = datetime.now() #Start counting time
    unique_ijk = list(set([(i,j,k) for i,j,k,l,m in A]))
    

    Then, create a dictionary called lm_given_ijk that is valued with a list of l,m indices corresponding to a given i,j,k tuple key.

    lm_given_ijk = {(i,j,k):[] for i,j,k in unique_ijk}
    for i,j,k,l,m in A:
        lm_given_ijk[i,j,k].append((l,m))
    

    Finally, use lm_given_ijk as follows to create E.

    E = {(i,j,k): sum(C[i,j,k,l,m]*D[l,m,k] for l,m in lm_given_ijk[i,j,k]) 
                      for i,j,k in unique_ijk}
    print("Elapsed time is %s seconds.\n"%(datetime.now()-start_timer).total_seconds())
    

    Output:

    Elapsed time is 6.446798 seconds.
    

    Writing all these, I agree with comments saying this is a numpy array thing. It could improve the speed, but I am happy with 6.4 seconds.