Search code examples
pythonstringcastingpython-itertools

'10' a string is being broken into 1,0 when itertools combinations used


I have a list of a sequences given as strings such as ['0',.....,'10']. When I use itertools.combinations to get a combination of 2s such as ('0','2'), I notice that the '10' isn't being counted as a single entity. I.e. I don't see a ('2','10'), instead I see a ('2','1','0') or any combination with 10. It is being considered as 1,0 and then the combinations are being formed. I would appreciate a fix for this. I did consider converting the format to int but I want to keep forming larger combinations such as (1,2,3) after forming length 2 combinations.

def frequentPattern(data, minsup):
    frequentSets = []
    itemset = {}


    for line in data:
        for c in line.replace(',','').split():

            if itemset.get(c)==None:
                itemset[c]=0
            itemset[c]+=1

    k = 1
    while itemset != {}:
        prevCandidates = []
        print itemset.keys()
        for i in itemset.keys():
            print i
            if itemset[i] >= minsup:
                prevCandidates.append(i)
                if i not in frequentSets:
                    frequentSets.append(i)

        candidates = []
        for i in itertools.combinations(prevCandidates,2):
            cell = tuple(set(i[0]+i[1]))
            #print cell
            #cell = tuple(sorted(cell))
            if len(cell)<=(k+1):
                candidates.append(cell)
        candidates = list(set(candidates))

        itemset = {}
        for line in data:
            for cell in candidates:
                if set(cell) <= set(tuple(line.replace(',','').split())):
                    if itemset.get(cell)==None:
                        itemset[cell]=0
                    itemset[cell]+=1

        k = k+1
    return frequentSets

As noted the problem was with the cell = tuple(set(i[0]+i[1])) line. Do you see a way around it? The objective of that line was to create combinations of length greater than 2.


Solution

  • itertools.combinations is working properly. You are breaking up the result with cell = tuple(set(i[0]+i[1])). That adds the two strings (e.g. '910') then breaks up the digits. i has the correct result of, for this example, ('9','10').