SO, I have what I think is a difficult problem to solve, I have a script that cycles through a CSV to count the number of occurrences of data in different columns. This script works well and is contained below for referencing:
import csv
import datetime
import copy
from collections import defaultdict
with open(r"C:\Temp\test2.csv") as i, open(r"C:\Temp\results2.csv", "wb") as o:
rdr = csv.reader(i)
wrt = csv.writer(o)
data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None
for line in rdr:
date = datetime.datetime.strptime(line[0], '%d/%m/%Y')
name = (line[7], line[9])
if date != currdate or not currdate:
for v in data.itervalues(): v[:2] = v[2:]
currdate = date
wrt.writerow(line + data[name][:2])
data[name][3] += 1
if line[6] == "1": data[name][2] += 1
I have edited this script to add a percentage column and I can make this script do multiple different combinations of column matches manually, e.g column 7/9 here I can make it column 7/10 etc all in one script. However what I need it to do I do not know the required function or method for. Essentially I need it to go through each calclist
contained in this script and output the numbers associated with this script for every combination of the column references in the calclists. i.e. for 6/7 6/19 6/23
Because in my real script the calc lists are much longer than in this example, it would also be nice if this edit could include some way of attaching the titles to column I don't have a suitable method or mechanism to do this. But if there was a list of titles for the calc lists, some how it might be possible to create a titles in this format (remembering there are three for each run of the script) "title1-title2-x","title1-title2-y","title1-title2-z"
import csv
import datetime
import copy
from collections import defaultdict
with open(r"dualparametermatch_test.csv") as i, open(r"dualparametermatch_test_edit.csv", "wb") as o:
rdr = csv.reader(i)
wrt = csv.writer(o)
data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None
# Identical calclists
calclist = [6, 7, 19, 23, 25, 26, 35, 62, 64]
calclist2 = [6, 7, 19, 23, 25, 26, 35, 62, 64]
for counter, line in enumerate(rdr):
if counter == 0:
#Titles, there are three for each item in the calclist
titles = ["titleX", "titleY", "titleZ"] # ... etc
wrt.writerow(line + titles)
else:
extra_cols = []
for calc in calclist:
date = datetime.datetime.strptime(line[0], '%d/%m/%Y')
name = (line[calclist], line[calclist2])
if date != currdate or not currdate:
for v in data.itervalues(): v[:2] = v[2:]
currdate = date
### Adds the percentage calulation column
top,bottom = data[name][0:2]
try:
quotient = '{0:0.5f}'.format(float(top)/bottom).rstrip("0")
except ZeroDivisionError:
quotient = 0
extra_cols.extend(data[name][:2]+ [quotient])
data[name][3] += 1
if line[6] == "1": data[name][2] += 1
wrt.writerow(line + data[name][:2])
I appreciate this could be a difficult problem to solve and if anyone out there can help with this then first and foremost Kudos to you! If more detail is required or anything is unclear please get back to me. I can provide example data and output for the original script if required. Thanks in advance AEA
I'm not sure if I am understanding the question correctly so this could be completely off base but if you simply want all unique combinations of the numbers that appear in all calclists you could approach it like this:
calclists = [[1,2,3], [4,5,6], [7,8,9]] # calclists is a list of calclists
calcset = set()
for calclist in calclists:
for x in calclist:
calcset.add(x)
unique_calclist = list(calcset)
for x in unique_calclist:
for y in unique_calclist[1:]:
# in your example you didn't use combinations
# of duplicate valuesso I am skipping that here
if x != y:
print (x, y)
Here is the same thing using itertools (note this approach assumes the value in each list in calclists is unique).
import itertools
calclists = [[1,2,3], [4,5,6], [7,8,9]]
comb_itr = itertools.combinations(itertools.chain.from_iterable(calclists), 2)
for comb in comb_itr:
print comb
If you can't assume that all of the values in each list are unique you could combine the above two approaches like so:
import itertools
calclists = [[1,2,3], [4,5,6], [1, 2, 3]]
calcset = set()
for calclist in calclists:
for x in calclist:
calcset.add(x)
comb_itr = itertools.combinations(calcset, 2)
for comb in comb_itr:
print comb