Search code examples
pythonfor-loopdictionarynested-loopssimplify

Sum selected elements in dict of dicts in Python using one liner instead of for-loop


I used the below dict comprehension

dimer = {(ab+cd):{"1":0,"2":0,"3":0} for cd in 'ACGT' for ab in 'ACGT'}

to generate a dict of dicts,dimer:

dimer = {"AA":{"1":0,"2":0,"3":0}, "AC":{"1":0,"2":0,"3":0}, "AG":{"1":0,"2":0,"3":0}, "AT":{"1":0,"2":0,"3":0}, "CA":{"1":0,"2":0,"3":0}, "CC":{"1":0,"2":0,"3":0}, "CG":{"1":0,"2":0,"3":0}, "CT":{"1":0,"2":0,"3":0}, "GA":{"1":0,"2":0,"3":0}, "GC":{"1":0,"2":0,"3":0}, "GG":{"1":0,"2":0,"3":0}, "GT":{"1":0,"2":0,"3":0}, "TA":{"1":0,"2":0,"3":0}, "TC":{"1":0,"2":0,"3":0}, "TT":{"1":0,"2":0,"3":0}, "TG":{"1":0,"2":0,"3":0}}

However, now I want to sum up selected elements,

If I hardcode them out, it would be like

total_A = dimer["AA"]["1"]+dimer["CA"]["1"]+dimer["GA"]["1"]+dimer["TA"]["1"]+dimer["AA"]["2"]+dimer["CA"]["2"]+dimer["GA"]["2"]+dimer["TA"]["2"]+dimer["AA"]["3"]+dimer["CA"]["3"]+dimer["GA"]["3"]+dimer["TA"]["3"]
total_C = dimer["AC"]["1"]+dimer["CC"]["1"]+dimer["GC"]["1"]+dimer["TC"]["1"]+dimer["AC"]["2"]+dimer["CC"]["2"]+dimer["GC"]["2"]+dimer["TC"]["2"]+dimer["AC"]["3"]+dimer["CC"]["3"]+dimer["GC"]["3"]+dimer["TC"]["3"]
total_G = dimer["AG"]["1"]+dimer["CG"]["1"]+dimer["GG"]["1"]+dimer["TG"]["1"]+dimer["AG"]["2"]+dimer["CG"]["2"]+dimer["GG"]["2"]+dimer["TG"]["2"]+dimer["AG"]["3"]+dimer["CG"]["3"]+dimer["GG"]["3"]+dimer["TG"]["3"]
total_T = dimer["AT"]["1"]+dimer["CT"]["1"]+dimer["GT"]["1"]+dimer["TT"]["1"]+dimer["AT"]["2"]+dimer["CT"]["2"]+dimer["GT"]["2"]+dimer["TT"]["2"]+dimer["AT"]["3"]+dimer["CT"]["3"]+dimer["GT"]["3"]+dimer["TT"]["3"]

The best approach I have come up with to simplify it is using nested for-loops:

total_0 = {i:0 for i in 'ACGT'}   
for i in 'ACGT':    
    for j in 'ACGT':
        for k in '123':
            total_0[i] += dimer[j+i][k]  

I was wondering if there is any method to sum them up using a one liner?

I also have another nested for-loops:

row_sum = {i:{"1":0,"2":0,"3":0} for i in 'ACGT'}   
for i in 'ACGT':    
    for j in 'ACGT':
        for k in '123': 
            row_sum[i][k] += float(dimer[i+j][k])

The hardcode version is like:

row_sum = {"A":{"1":0,"2":0,"3":0},"C":{"1":0,"2":0,"3":0},"G":{"G":0,"2":0,"3":0},"T":{"1":0,"2":0,"3":0}} 
for i in range(1,4,1): 
    row_sum["A"][str(i)] = float(dimer["AA"][str(i)]+dimer["AC"][str(i)]+dimer["AG"][str(i)]+dimer["AT"][str(i)])
    row_sum["C"][str(i)] = float(dimer["CA"][str(i)]+dimer["CC"][str(i)]+dimer["CG"][str(i)]+dimer["CT"][str(i)])
    row_sum["G"][str(i)] = float(dimer["GA"][str(i)]+dimer["GC"][str(i)]+dimer["GG"][str(i)]+dimer["GT"][str(i)])
    row_sum["T"][str(i)] = float(dimer["TA"][str(i)]+dimer["TC"][str(i)]+dimer["TG"][str(i)]+dimer["TT"][str(i)])

I am also wondering if there is any method to sum the second nested for-loop up using a one liner?

Sorry I am really new to Python. Any help will be appreciated!


Solution

  • Firstly, you can collapse the 3 loops into one using a cartesian product like this.

    from itertool import product
    row_sum = {i: {"1": 0, "2": 0, "3": 0} for i in NT}   
    for i, j, k in product('ACGT', 'ACGT', '123'):    
        row_sum[i][k] += float(dimer[i + j][k])
    

    Here is a one liner, but it's probably hard for you to follow if you are new to Python

    {i: sum(sum(dimer[i + j].values()) for j in 'ACGT') for i in 'ACGT'}