I'm writing a program where I need to group jobs to minimise tool changes for a CNC punch. I asked a question about it when I was just starting out but as I've written the program the approach has changed.
I have a list of jobs:
jobs = ['5618', '5612', '5613', '5897', '5589', '5474',
'5472', '5470', '5471', '9040']
my program already sorts these into valid 'groups' of jobs that can be done together:
['5471', '9040']
['5618', '5612', '5613', '5472']
['5471', '5589', '5897']
['5618', '5612', '5471', '5613']
['5471', '5474']
['5471', '5470', '5589']
Now I need to choose the minimum number of groups that will mean all the jobs get done. At a guess it would be this combination:
['5618', '5612', '5613', '5472']
['5471', '5589', '5897']
['5470', '5589'] #this is the last group in the above lists but with 5471
#removed because it's already been done
[9040] #same as above except the first group with 5471 already been done.
The way I'm logically thinking of doing it is finding the longest valid group, assume that's the first group, remove the jobs that are in that group from the remaining jobs. Repeat. However that doesn't work when there are two groups with the same length. Maybe a weighting function based on how many times a job is in other groups or something? I don't know. Ideas? Brute force it? haha
This problem is known as the set cover problem. Essentially, you are asking for the smallest set of groups that will cover all of the jobs you want to do.
Unfortunately, the problem is NP-complete. Therefore there is not likely to be an efficient algorithm to find the solution. It may not matter if you have a small number of jobs or groups (since you can just brute-force it), though.
The greedy algorithm you've proposed is actually quite good, and is equivalent to the greedy algorithm on the linked Wikipedia page. It is in fact the "best" possible algorithm which still runs in polynomial time. To break ties, choose arbitrarily (it turns out not to matter a huge amount).
If tool changes are expensive enough to warrant additional CPU time, you can bruteforce the problem by considering all 2^n possible sets of groups (where n is the number of groups). You can probably handle up to n=20 with a typical machine easily, and up to n=30 if you are willing to wait a while. There are optimizations you can do to the bruteforce to reduce the number of solutions it will check (notably, you can breadth-first search through the set graph so that you stop when the first minimal solution is found), but again you will still end up doing an exponential amount of work to find the true optimal solution.