I have a list of a sequences given as strings such as ['0',.....,'10']
. When I use itertools.combinations
to get a combination of 2s such as ('0','2')
, I notice that the '10'
isn't being counted as a single entity. I.e. I don't see a ('2','10')
, instead I see a ('2','1','0')
or any combination with 10
. It is being considered as 1,0
and then the combinations are being formed. I would appreciate a fix for this. I did consider converting the format to int but I want to keep forming larger combinations such as (1,2,3)
after forming length 2 combinations.
def frequentPattern(data, minsup):
frequentSets = []
itemset = {}
for line in data:
for c in line.replace(',','').split():
if itemset.get(c)==None:
itemset[c]=0
itemset[c]+=1
k = 1
while itemset != {}:
prevCandidates = []
print itemset.keys()
for i in itemset.keys():
print i
if itemset[i] >= minsup:
prevCandidates.append(i)
if i not in frequentSets:
frequentSets.append(i)
candidates = []
for i in itertools.combinations(prevCandidates,2):
cell = tuple(set(i[0]+i[1]))
#print cell
#cell = tuple(sorted(cell))
if len(cell)<=(k+1):
candidates.append(cell)
candidates = list(set(candidates))
itemset = {}
for line in data:
for cell in candidates:
if set(cell) <= set(tuple(line.replace(',','').split())):
if itemset.get(cell)==None:
itemset[cell]=0
itemset[cell]+=1
k = k+1
return frequentSets
As noted the problem was with the cell = tuple(set(i[0]+i[1])) line. Do you see a way around it? The objective of that line was to create combinations of length greater than 2.
itertools.combinations
is working properly. You are breaking up the result with cell = tuple(set(i[0]+i[1]))
. That adds the two strings (e.g. '910'
) then breaks up the digits. i
has the correct result of, for this example, ('9','10')
.