I have a problem (more lined out here (Best way to compare multiple key values [lists] and return multiples?))
Short summary:
Now I have several lists I want to compare, filtering out values that are present in more than one list.
I want to get: All values that are present in more than one list How often these values are present (so, like if they are present 2 times in every list, I want to give out these 2 - not the total occurences in all lists!) And, in the end: I want to count values that are in more then one list, but not in every list.
The Setup:
In a loop, I add lists of data I want to compare to a "master" list:
[
['Limerick (IRE)', 'Fairyhouse (IRE)', 'Gowran Park (IRE)', 'Galway (IRE)', 'Roscommon (IRE)', 'Ballinrobe (IRE)', 'Roscommon (IRE)', 'Downpatrick (IRE)', 'Ballinrobe (IRE)', 'Curragh (IRE)', 'Naas (IRE)', 'Curragh (IRE)', 'Galway (IRE)', 'Cork (IRE)', 'Punchestown (IRE)', 'Galway (IRE)', 'Tipperary (IRE)', 'Curragh (IRE)', 'Gowran Park (IRE)', 'Cork (IRE)', 'Galway (IRE)', 'Killarney (IRE)', 'Curragh (IRE)', 'Roscommon (IRE)', 'Limerick (IRE)', 'Newton Abbot', 'Bangor-on-Dee', 'Bangor-on-Dee'],
['Newton Abbot', 'Worcester', 'Ffos Las', 'Worcester', 'Newton Abbot', 'Hereford', 'Worcester', 'Chepstow', 'Newton Abbot', 'Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee'],
['Aintree', 'Market Rasen', 'Market Rasen', 'Newcastle', 'Stratford', 'Hexham', 'Cartmel', 'Stratford', 'Cartmel', 'Cartmel','Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee', 'Killarney (IRE)']
]
There can be only 2 lists, or 20, or more to compare.
Now I try to get the multiples by Counter()and extract the most common ones PSEUDO CODE:
doubles= Counter()
for w in testlist[1]:
doubles[w] = testlist[2].count(w)
result4 = results3.most_common(2)
result5 = [result4[0]]
But this does not work as intended: Because it counts the occurences of the multiples in one list! (if in list 1 the word / number is there one time, but in the second one it is five times, I still want to get only 1 as an output - not six, or five. If it is two times in List[1] and 3 times in List[2], I want to **get 2 (+ the number/word!) **as an output, and so on)
The second problem that I have: The number of lists varies. Can be 2, can be 20. So testlist[1] is just a placeholder - I would have to check all 20 lists (or whatever the number) for occurences of the word/number that is in every list.
I can´t wrap my head around how to do this. Hopefully you can help me
Edit
prior to any optimization, I would break this task into three steps,
import collections
import json
data = [
['Limerick (IRE)', 'Fairyhouse (IRE)', 'Gowran Park (IRE)', 'Galway (IRE)', 'Roscommon (IRE)', 'Ballinrobe (IRE)', 'Roscommon (IRE)', 'Downpatrick (IRE)', 'Ballinrobe (IRE)', 'Curragh (IRE)', 'Naas (IRE)', 'Curragh (IRE)', 'Galway (IRE)', 'Cork (IRE)', 'Punchestown (IRE)', 'Galway (IRE)', 'Tipperary (IRE)', 'Curragh (IRE)', 'Gowran Park (IRE)', 'Cork (IRE)', 'Galway (IRE)', 'Killarney (IRE)', 'Curragh (IRE)', 'Roscommon (IRE)', 'Limerick (IRE)', 'Newton Abbot', 'Bangor-on-Dee', 'Bangor-on-Dee'],
['Newton Abbot', 'Worcester', 'Ffos Las', 'Worcester', 'Newton Abbot', 'Hereford', 'Worcester', 'Chepstow', 'Newton Abbot', 'Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee'],
['Aintree', 'Market Rasen', 'Market Rasen', 'Newcastle', 'Stratford', 'Hexham', 'Cartmel', 'Stratford', 'Cartmel', 'Cartmel','Bangor-on-Dee', 'Stratford', 'Ffos Las', 'Huntingdon', 'Newton Abbot', 'Bangor-on-Dee', 'Killarney (IRE)']
]
## ---------------------
## Gather the per row counts
## ---------------------
data_counted = [
dict(collections.Counter(row))
for row
in data
]
#print(json.dumps(data_counted, indent=4, sort_keys=True))
## ---------------------
## ---------------------
## merge the rows on name
## ---------------------
data_counted_combined = {}
for row in data_counted:
for name, count in row.items():
target = data_counted_combined.setdefault(name, []) ## make sure this key is initialized
target.append(count)
#print(json.dumps(data_counted_combined, indent=4, sort_keys=True))
## ---------------------
## ---------------------
## Generate the final result (sorted for fun)
## ---------------------
for key, value in sorted(data_counted_combined.items(), key=lambda x: x[0]):
print(f"\"{key}\" appears in { len(value) } list(s) a minimum of { min(value) } times.")
## ---------------------
This produces the following:
"Aintree" appears in 1 list(s) a minimum of 1 times.
"Ballinrobe (IRE)" appears in 1 list(s) a minimum of 2 times.
"Bangor-on-Dee" appears in 3 list(s) a minimum of 2 times.
"Cartmel" appears in 1 list(s) a minimum of 3 times.
"Chepstow" appears in 1 list(s) a minimum of 1 times.
"Cork (IRE)" appears in 1 list(s) a minimum of 2 times.
"Curragh (IRE)" appears in 1 list(s) a minimum of 4 times.
"Downpatrick (IRE)" appears in 1 list(s) a minimum of 1 times.
"Fairyhouse (IRE)" appears in 1 list(s) a minimum of 1 times.
"Ffos Las" appears in 2 list(s) a minimum of 1 times.
"Galway (IRE)" appears in 1 list(s) a minimum of 4 times.
"Gowran Park (IRE)" appears in 1 list(s) a minimum of 2 times.
"Hereford" appears in 1 list(s) a minimum of 1 times.
"Hexham" appears in 1 list(s) a minimum of 1 times.
"Huntingdon" appears in 2 list(s) a minimum of 1 times.
"Killarney (IRE)" appears in 2 list(s) a minimum of 1 times.
"Limerick (IRE)" appears in 1 list(s) a minimum of 2 times.
"Market Rasen" appears in 1 list(s) a minimum of 2 times.
"Naas (IRE)" appears in 1 list(s) a minimum of 1 times.
"Newcastle" appears in 1 list(s) a minimum of 1 times.
"Newton Abbot" appears in 3 list(s) a minimum of 1 times.
"Punchestown (IRE)" appears in 1 list(s) a minimum of 1 times.
"Roscommon (IRE)" appears in 1 list(s) a minimum of 3 times.
"Stratford" appears in 2 list(s) a minimum of 1 times.
"Tipperary (IRE)" appears in 1 list(s) a minimum of 1 times.
"Worcester" appears in 1 list(s) a minimum of 3 times.