Search code examples
pythonlistscalaaggregategrouping

How to group and aggregate two lists in Python or Scala?


Given input lists:

L1 = [("A","p1",20), ("B","p2",30)]
L2 = [("A","p1",100), ("c","p3",35)]

Expected output:

[(A,p1,20,100), (B,p2,30,"not in L2"), ("c","p3",35,"not in L1")]

I have tried using two for loops one for L1 and other for L2 but it is not working for iterative elements and giving repeated output which is not needed.


Solution

  • First, create a dictionary holding all possible grouping keys ("A","p1", "B","p2", etc.). Then, loop through keys in this dictionary to find if it exists in either of the lists.

    L1 = [("A","p1",20), ("B","p2",30)]
    L2 = [("A","p1",100), ("c","p3",35)]
    
    
    d = {}
    for x, y, z in L1 + L2:
        d[(x, y)] = [z] if not d.get((x, y)) else d[(x, y)] + [z]
    for k in d:
        if k not in {(x, y) for x, y, z in L1}:
            d[k] += ["not in L1"]
        if k not in {(x, y) for x, y, z in L2}:
            d[k] += ["not in L2"]
    L = [(*k, *v) for k, v in d.items()]
    
    print(L)
    # [('A', 'p1', 20, 100), ('B', 'p2', 30, 'not in L2'), ('c', 'p3', 35, 'not in L1')]
    

    This would be safe if you had doubled keys in one list:

    L1 = [("A","p1",20), ("B","p2",30)]
    L2 = [("A","p1",100), ("A","p1",200), ("c","p3",35)]
    

    Then the result would be

    # [('A', 'p1', 20, 100, 200), ('B', 'p2', 30, 'not in L2'), ('c', 'p3', 35, 'not in L1')]
    

    For more lists:

    L1 = [("A","p1",20), ("B","p2",30)]
    L2 = [("A","p1",100), ("c","p3",35)]
    L3 = [("A","p1",200)]
    
    
    d = {}
    lists = ["L1", "L2", "L3"]
    for x, y, z in [x for lst in lists for x in eval(lst)]:
        d[(x, y)] = [z] if not d.get((x, y)) else d[(x, y)] + [z]
    for k in d:
        for lst in lists:
            if k not in {(x, y) for x, y, z in eval(lst)}:
                d[k] += [f"not in {lst}"]
    L = [(*k, *v) for k, v in d.items()]
    
    print(L)
    # [('A', 'p1', 20, 100, 200), ('B', 'p2', 30, 'not in L2', 'not in L3'), ('c', 'p3', 35, 'not in L1', 'not in L3')]