Search code examples
pythonexcelalgorithmpython-2.7xlrd

How do I edit this algorithm so that it creates combinations without replacement or reusing an element?


I'm very new to computer science - this is my first program. I wrote a python program that takes data in two columns from an excel sheet "Labels":"Values" and reconfigures them into lists of Labels whose respective values sum to 30. Each label is unique and only occurs once, but the different labels can have the same value.

However, when I first applied my program, the runtime was nearly 30 minutes because the algorithm was creating every possible combination of Labels. Obviously given 50 labels with values less that 10, that is a lot of possible combinations.

I wanted some help editing my current algorithm so that it creates unique groups. Once a Label is used, I don't want it to appear in any other group.

Currently my code looks like this:

def combinator(X): #this function pulls two lists together to create a dictionary
  from xlrd import open_workbook
  wb = open_workbook(X)
  L = []
  P = []
  for s in wb.sheets(): 
      for col in range(s.ncols):
          for row in range(s.nrows):
              if row !=0:
                  l = s.cell(row,0).value
                  L.append(l)

                  p = s.cell(row,1).value
                  P.append(p)
  Lots = L[0:(len(L)/2)]
  Pallets = P[0:(len(L)/2)]
  D = dict(zip(Lots,Pallets))
  return D


def grouper(D, N):#this finds combinations of 30 and puts them in groups
  from itertools import combinations
  groups_of_thirty = []

  for l in range(0, len(D)):
      for y in combinations(D.items(), l):
          keys = []
          sum = 0

          for key, value in y:
              keys.append(key)
              sum += value

          if sum == N:
              groups_of_thirty.append(keys)

  flattened = [v for flat in groups_of_thirty for v in flat]
  K = D.keys()
  s = set(K)
  remainder = list(s - set(flattened))
  list(groups_of_thirty)
  return groups_of_thirty, \
       remainder


def transposer(G):#this exports the final list into each cell and writes  the spreadsheet as a csv to a different directory.
    import os
    os.chdir(Q)
    import csv
    with open(O, "wb") as f:
        writer = csv.writer(f)
        str(writer.writerows(G))
    return

transposer(grouper(combinator(I),N))

Would appreciate any help with this - logic or pseudocode preferred but some pointers with uncommon syntax would be helpful since I'm a novice.

Thank you!

Edit: Here is a screenshot of sample data in an excel sheet.

Screenshot of Excel sheet with Input and Desired Output


Solution

  • Here's a linear programming approach that I mentioned in the comments:

    import pandas as pd
    from pulp import *
    import random
    random.seed(0)
    num_items = 50
    labels = ['Label{0:02d}'.format(i) for i in range(num_items)]
    values = [random.randint(5, 30) for _ in range(num_items)]
    df = pd.DataFrame({'Value': values}, index=labels)
    feasible_solutions = []
    var = LpVariable.dicts('x', labels, 0, 1, LpInteger)
    prob = pulp.LpProblem('Fixed Sum', pulp.LpMinimize)
    prob += 0
    prob += lpSum([var[label] * df.at[label, 'Value'] for label in labels]) == 30
    
    while prob.solve() == 1:
        current_solution = [label for label in labels if value(var[label])]
        feasible_solutions.append(current_solution)
        prob += lpSum([var[label] for label in current_solution]) <= len(current_solution) - 1
    

    labels is a regular list that holds the labels and values are random integers between 5 and 30. It starts with an empty set.

    One of the most important elements in this code is var. It is the decision variable that takes either value 0 or 1. When you include a specific label in the solution, it takes value 1, otherwise it is equal to zero.

    For example, assume you have this list [12, 7, 5, 13]. Here, for a possible solution var00 (12), var02 (5) and var03 (13) can take value 1.

    The next line creates a linear programming problem. We specify an arbitrary objective function (prob += 0) because we are not minimizing or maximizing any function - we are trying to find all possible solutions.

    These solutions should satisfy the constraint in the next line. Here, var[label] is a binary decision variable as I mentioned. If it's included in the solution, it takes value 1 and 1 is multiplied by its value. So we are only summing the values of the items included in the set. Their total should be equal to 30. Here, prob.solve() would generate a feasible solution but since you want all feasible solutions, you call prob.solve() in a while loop. As long as it can return a feasible solution (==1) continue the loop. But in each iteration we should exclude the current solution so that our search space is reduced. It is done by the last line. For example, if in the current solution we have var00, var04 and var07, their sum should be smaller than 3 for the subsequent problems (all three shouldn't be 1 at the same time). If you run this, it will generate all possible solutions for your problem.

    Here's the first five:

    feasible_solutions[:5]
    Out: 
    [['Label00', 'Label47'],
     ['Label17', 'Label46'],
     ['Label42', 'Label45'],
     ['Label03', 'Label13', 'Label47'],
     ['Label02', 'Label03', 'Label48']]
    

    And these are their values:

    Out: [[17, 13], [9, 21], [11, 19], [6, 11, 13], [18, 6, 6]]