Search code examples
pythonlistlist-comprehensioncounter

Casting Counter() to dict or list


I have a problem (more lined out here (Best way to compare multiple key values [lists] and return multiples?))

Short summary:

Now I have several lists I want to compare, filtering out values that are present in more than one list.

I want to get: All values that are present in more than one list How often these values are present (so, like if they are present 2 times in every list, I want to give out these 2 - not the total occurences in all lists!) And, in the end: I want to count values that are in more then one list, but not in every list.

The Setup:

In a loop, I add lists of data I want to compare to a "master" list:

[
['Limerick (IRE)', 'Fairyhouse (IRE)', 'Gowran Park (IRE)', 'Galway (IRE)', 'Roscommon (IRE)', 'Ballinrobe (IRE)', 'Roscommon (IRE)', 'Downpatrick (IRE)', 'Ballinrobe (IRE)', 'Curragh (IRE)', 'Naas (IRE)', 'Curragh (IRE)', 'Galway (IRE)', 'Cork (IRE)', 'Punchestown (IRE)', 'Galway (IRE)', 'Tipperary (IRE)', 'Curragh (IRE)', 'Gowran Park (IRE)', 'Cork (IRE)', 'Galway (IRE)', 'Killarney (IRE)', 'Curragh (IRE)', 'Roscommon (IRE)', 'Limerick (IRE)', 'Newton Abbot', 'Bangor-on-Dee', 'Bangor-on-Dee'],

['Newton Abbot', 'Worcester', 'Ffos Las', 'Worcester', 'Newton Abbot', 'Hereford', 'Worcester', 'Chepstow', 'Newton Abbot', 'Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee'],

['Aintree', 'Market Rasen', 'Market Rasen', 'Newcastle', 'Stratford', 'Hexham', 'Cartmel', 'Stratford', 'Cartmel', 'Cartmel','Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee', 'Killarney (IRE)']
]

There can be only 2 lists, or 20, or more to compare.

Now I try to get the multiples by Counter()and extract the most common ones PSEUDO CODE:

        doubles= Counter()
        for w in testlist[1]:
            doubles[w] = testlist[2].count(w)
        result4 = results3.most_common(2)
        result5 = [result4[0]]

But this does not work as intended: Because it counts the occurences of the multiples in one list! (if in list 1 the word / number is there one time, but in the second one it is five times, I still want to get only 1 as an output - not six, or five. If it is two times in List[1] and 3 times in List[2], I want to **get 2 (+ the number/word!) **as an output, and so on)

The second problem that I have: The number of lists varies. Can be 2, can be 20. So testlist[1] is just a placeholder - I would have to check all 20 lists (or whatever the number) for occurences of the word/number that is in every list.

I can´t wrap my head around how to do this. Hopefully you can help me

Edit

  • Added a third list for the example
  • Expected Output: Comparing these lists, I would like to get something like:
  • Bangor-on-Dee: 3 (because it is in all 3 lists), 2 (because it is in all lists, at least two times)
  • Killarney: 2 (because it is only in 2 of those lists), 1 (beause it is only at least once in those lists)

Solution

  • prior to any optimization, I would break this task into three steps,

    1. count the occurrences of each key in each row
    2. merge the counts based on key
    3. print the results based on what we find out about the counts after merging them
    import collections
    import json
    
    data = [
        ['Limerick (IRE)', 'Fairyhouse (IRE)', 'Gowran Park (IRE)', 'Galway (IRE)', 'Roscommon (IRE)', 'Ballinrobe (IRE)', 'Roscommon (IRE)', 'Downpatrick (IRE)', 'Ballinrobe (IRE)', 'Curragh (IRE)', 'Naas (IRE)', 'Curragh (IRE)', 'Galway (IRE)', 'Cork (IRE)', 'Punchestown (IRE)', 'Galway (IRE)', 'Tipperary (IRE)', 'Curragh (IRE)', 'Gowran Park (IRE)', 'Cork (IRE)', 'Galway (IRE)', 'Killarney (IRE)', 'Curragh (IRE)', 'Roscommon (IRE)', 'Limerick (IRE)', 'Newton Abbot', 'Bangor-on-Dee', 'Bangor-on-Dee'],
        ['Newton Abbot', 'Worcester', 'Ffos Las', 'Worcester', 'Newton Abbot', 'Hereford', 'Worcester', 'Chepstow', 'Newton Abbot', 'Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee'],
        ['Aintree', 'Market Rasen', 'Market Rasen', 'Newcastle', 'Stratford', 'Hexham', 'Cartmel', 'Stratford', 'Cartmel', 'Cartmel','Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee', 'Killarney (IRE)']
    ]
    
    ## ---------------------
    ## Gather the per row counts
    ## ---------------------
    data_counted = [
        dict(collections.Counter(row))
        for row
        in data
    ]
    #print(json.dumps(data_counted, indent=4, sort_keys=True))
    ## ---------------------
    
    ## ---------------------
    ## merge the rows on name
    ## ---------------------
    data_counted_combined = {}
    for row in data_counted:
        for name, count in row.items():
            target = data_counted_combined.setdefault(name, []) ## make sure this key is initialized
            target.append(count)
    #print(json.dumps(data_counted_combined, indent=4, sort_keys=True))
    ## ---------------------
    
    ## ---------------------
    ## Generate the final result (sorted for fun)
    ## ---------------------
    for key, value in sorted(data_counted_combined.items(), key=lambda x: x[0]):
        print(f"\"{key}\" appears in { len(value) } list(s) a minimum of { min(value) } times.")
    ## ---------------------
    

    This produces the following:

    "Aintree" appears in 1 list(s) a minimum of 1 times.
    "Ballinrobe (IRE)" appears in 1 list(s) a minimum of 2 times.
    "Bangor-on-Dee" appears in 3 list(s) a minimum of 2 times.
    "Cartmel" appears in 1 list(s) a minimum of 3 times.
    "Chepstow" appears in 1 list(s) a minimum of 1 times.
    "Cork (IRE)" appears in 1 list(s) a minimum of 2 times.
    "Curragh (IRE)" appears in 1 list(s) a minimum of 4 times.
    "Downpatrick (IRE)" appears in 1 list(s) a minimum of 1 times.
    "Fairyhouse (IRE)" appears in 1 list(s) a minimum of 1 times.
    "Ffos Las" appears in 2 list(s) a minimum of 1 times.
    "Galway (IRE)" appears in 1 list(s) a minimum of 4 times.
    "Gowran Park (IRE)" appears in 1 list(s) a minimum of 2 times.
    "Hereford" appears in 1 list(s) a minimum of 1 times.
    "Hexham" appears in 1 list(s) a minimum of 1 times.
    "Huntingdon" appears in 2 list(s) a minimum of 1 times.
    "Killarney (IRE)" appears in 2 list(s) a minimum of 1 times.
    "Limerick (IRE)" appears in 1 list(s) a minimum of 2 times.
    "Market Rasen" appears in 1 list(s) a minimum of 2 times.
    "Naas (IRE)" appears in 1 list(s) a minimum of 1 times.
    "Newcastle" appears in 1 list(s) a minimum of 1 times.
    "Newton Abbot" appears in 3 list(s) a minimum of 1 times.
    "Punchestown (IRE)" appears in 1 list(s) a minimum of 1 times.
    "Roscommon (IRE)" appears in 1 list(s) a minimum of 3 times.
    "Stratford" appears in 2 list(s) a minimum of 1 times.
    "Tipperary (IRE)" appears in 1 list(s) a minimum of 1 times.
    "Worcester" appears in 1 list(s) a minimum of 3 times.