I'm trying to iterate a list that contains some duplicate elements. I'm using the amount of duplicates so I don't want to put the list in a set before I iterate over the list.
I'm trying to count how many times the element appears and then write the element (the name) and the count of how many times it appears.
The issue I am running into is that in my output CSV file, there are as many rows as there are times the element appears. I am writing the CSV to an HTML table after its completed so I want it to be deduplicated.
My end goal is to have it count how many times the name appears, then write a row to the CSV file that contains the name and the count, then move to the next name in the list.
I tried searching and came across itertools.groupby
but I'm not sure if that's going to be useful in this instance and if it is, how to use it correctly.
Thanks for the help.
EDIT: I forgot to mention - Python 2.6
with open(sys.argv[1]) as infile:
rdr = csv.DictReader(infile, dialect='excel')
qualsin = []
headers = ['Qualifier Name','Appointments']
for row in rdr:
row['Qualifier Name'] = row['Qualifier Name'].upper()
qualsin.append(row['Qualifier Name'])
qualsin.sort()
#total = 0
with open('tempwork.csv', 'w') as tempwork:
wrtr = csv.writer(tempwork, dialect='excel')
wrtr.writerow(headers)
for quals in qualsin:
d = [quals, qualsin.count(quals)]
#a = dict((key, value) for (key, value) in d)
#total += qualsin.count(quals)
wrtr.writerow(d)
You can depup in a set of another name, then use the original list to do the counting.
For instance, given qualsin = [0, 2, 3, 2, 3, 1, 2, 3, 5, 3, 3, 2, 4]
:
set_quals = set(qualsin) # This is set([0, 1, 2, 3, 4, 5])
for quals in set_quals: # Iterate over the values in the set, not the list
d = [quals, qualsin.count(quals) # count the values from the list, not the set
wrtr.writerow(d)
Or...
import collections
...
set_quals = set(qualsin) # This is set([0, 1, 2, 3, 4, 5])
counts = collections.Counter(qualsin) # This is Counter({3: 5, 2: 4, 0: 1, 1: 1, 4: 1, 5: 1}) which acts like a dictionary
for quals in set_quals:
d = [quals, counts[quals]] # use the name from the set and the value from the Counter
wrtr.writerow(d)
EDIT
Because of your update of using Python2.6, Counter is not available. However, the first solution will still work.
You could make a Counter yourself by just doing:
counts = collections.defaultdict(int) # Available since 2.5
for quals in qualsin:
counts[quals] += 1
Using the counter (either in 2.7 or homegrown like above) will reduce the time complexity by a factor of N if I am not mistaken. list.count
is O(N) and you are doing that in a loop so you get O(N^2). The single iteration to create the counter is just O(N), so for larger lists this could be a big help.
EDIT 2
To get the output in sorted alphabetical order, all you do is convert the de-duped list (set) back into a sorted list.
ordered_deduped_quals = sorted(set(qualsin))
for quals in ordered_deduped_quals:
...