Search code examples
pythonpython-2.7csvcollectionsdefaultdict

Editing script to account for every combination of two lists


SO, I have what I think is a difficult problem to solve, I have a script that cycles through a CSV to count the number of occurrences of data in different columns. This script works well and is contained below for referencing:

Original script

import csv
import datetime
import copy
from collections import defaultdict

with open(r"C:\Temp\test2.csv") as i, open(r"C:\Temp\results2.csv", "wb") as o:
    rdr = csv.reader(i)
    wrt = csv.writer(o)

    data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None
    for line in rdr:
        date = datetime.datetime.strptime(line[0], '%d/%m/%Y')

        name = (line[7], line[9])

        if date != currdate or not currdate:
            for v in data.itervalues(): v[:2] = v[2:]
            currdate = date

        wrt.writerow(line + data[name][:2])

        data[name][3] += 1
        if line[6] == "1": data[name][2] += 1

I have edited this script to add a percentage column and I can make this script do multiple different combinations of column matches manually, e.g column 7/9 here I can make it column 7/10 etc all in one script. However what I need it to do I do not know the required function or method for. Essentially I need it to go through each calclist contained in this script and output the numbers associated with this script for every combination of the column references in the calclists. i.e. for 6/7 6/19 6/23

Because in my real script the calc lists are much longer than in this example, it would also be nice if this edit could include some way of attaching the titles to column I don't have a suitable method or mechanism to do this. But if there was a list of titles for the calc lists, some how it might be possible to create a titles in this format (remembering there are three for each run of the script) "title1-title2-x","title1-title2-y","title1-title2-z"

import csv
import datetime
import copy
from collections import defaultdict

with open(r"dualparametermatch_test.csv") as i, open(r"dualparametermatch_test_edit.csv", "wb") as o:
    rdr = csv.reader(i)
    wrt = csv.writer(o)

    data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None

    # Identical calclists
    calclist = [6, 7, 19, 23, 25, 26, 35, 62, 64]
    calclist2 = [6, 7, 19, 23, 25, 26, 35, 62, 64]
    
    for counter, line in enumerate(rdr):
        if counter == 0:
            #Titles, there are three for each item in the calclist
            titles = ["titleX", "titleY", "titleZ"] # ... etc
            wrt.writerow(line + titles)
        else:
            extra_cols = []
            for calc in calclist:
                date = datetime.datetime.strptime(line[0], '%d/%m/%Y')
                name = (line[calclist], line[calclist2])

                if date != currdate or not currdate:
                    for v in data.itervalues(): v[:2] = v[2:]
                    currdate = date

                 ### Adds the percentage calulation column
                top,bottom = data[name][0:2]
                try:
                    quotient = '{0:0.5f}'.format(float(top)/bottom).rstrip("0")
                except ZeroDivisionError:
                    quotient = 0
                extra_cols.extend(data[name][:2]+ [quotient])

                data[name][3] += 1
                if line[6] == "1": data[name][2] += 1

            wrt.writerow(line + data[name][:2])

I appreciate this could be a difficult problem to solve and if anyone out there can help with this then first and foremost Kudos to you! If more detail is required or anything is unclear please get back to me. I can provide example data and output for the original script if required. Thanks in advance AEA


Solution

  • I'm not sure if I am understanding the question correctly so this could be completely off base but if you simply want all unique combinations of the numbers that appear in all calclists you could approach it like this:

    calclists = [[1,2,3], [4,5,6], [7,8,9]] # calclists is a list of calclists
    calcset = set()
    for calclist in calclists:
        for x in calclist:
            calcset.add(x)
    unique_calclist = list(calcset)
    for x in unique_calclist:
        for y in unique_calclist[1:]:
            # in your example you didn't use combinations 
            # of duplicate valuesso I am skipping that here
            if x != y: 
                print (x, y)
    

    Here is the same thing using itertools (note this approach assumes the value in each list in calclists is unique).

    import itertools
    calclists = [[1,2,3], [4,5,6], [7,8,9]]
    comb_itr = itertools.combinations(itertools.chain.from_iterable(calclists), 2)
    for comb in comb_itr:
        print comb
    

    If you can't assume that all of the values in each list are unique you could combine the above two approaches like so:

    import itertools
    calclists = [[1,2,3], [4,5,6], [1, 2, 3]]
    calcset = set()
    for calclist in calclists:
        for x in calclist:
            calcset.add(x)
    
    comb_itr = itertools.combinations(calcset, 2)
    for comb in comb_itr:
        print comb