I'm currently calculating the unique permutations of an array of data. While the following code is working, it's not as efficient as I would like. Once I get over 6 or 8 items, it becomes very slow and I start running into memory issues.
Here is the code and an explanation
<?php
function permuteUnique($items, $count = false, $perms = [], &$return = []) {
if ($count && count($return) == $count) return $return;
if (empty($items)) {
$duplicate = false;
foreach ($return as $a) {
if ($a === $perms) {
$duplicate = true;
break;
}
}
if (!$duplicate) $return[] = $perms;
} else {
for ($i = count($items) - 1; $i >= 0; --$i) {
$newitems = $items;
$newperms = $perms;
list($tmp) = array_splice($newitems, $i, 1);
array_unshift($newperms, $tmp);
permuteUnique($newitems, $count, $newperms, $return);
}
return $return;
}
}
function factorial($n) {
$f = 1;
for ($i = 2; $i <= $n; $i++) $f *= $i;
return $f;
}
Given the input [1, 1, 2]
I receive the following output as expected
array (size=3)
0 =>
array (size=3)
0 => int 1
1 => int 1
2 => int 2
1 =>
array (size=3)
0 => int 1
1 => int 2
2 => int 1
2 =>
array (size=3)
0 => int 2
1 => int 1
2 => int 1
The $count
parameter is so I can pass the number of unique permutations I expect to the function and once it has found that many, it can stop calculating and return the data. This is calculated as the factorial of the total number of items divided by the product of the factorial of the count of all duplicates. I'm not sure I said that right so let me show you an example.
Given the set [1, 2, 2, 3, 4, 4, 4, 4]
the count of unique permutations is calculated as
8! / (2!4!) = 840
because there are 8 total items, one of them is duplicated twice, and another is duplicated 4 times.
Now if I translate that to php code...
<?php
$set = [1, 2, 2, 3, 4, 4, 4, 4];
$divisor = 1;
foreach (array_count_values($set) as $v) {
$divisor *= factorial($v);
}
$count = factorial(count($set)) / $divisor;
$permutations = permuteUnique($set, $count);
it's pretty slow. If I throw a counter into the permuteUnique
function, it runs over 100k times before it finds the 840 unique permutations.
I would like to find a way to reduce this and find the shortest possible path to the unique permutations. I appreciate any help or advice you can give.
So I spent some more time thinking about this and here is what I came up with.
<?php
function permuteUnique($items, $perms = [], &$return = []) {
if (empty($items)) {
$return[] = $perms;
} else {
sort($items);
$prev = false;
for ($i = count($items) - 1; $i >= 0; --$i) {
$newitems = $items;
$tmp = array_splice($newitems, $i, 1)[0];
if ($tmp != $prev) {
$prev = $tmp;
$newperms = $perms;
array_unshift($newperms, $tmp);
permuteUnique($newitems, $newperms, $return);
}
}
return $return;
}
}
$permutations = permuteUnique([1, 2, 2, 3, 4, 4, 4, 4]);
Previous stats
Uniques: 840
Calls to permuteUnique: 107,591
Duplicates found: 38737
Execution time (seconds): 4.898668050766
New stats
Uniques: 840
Calls to permuteUnique: 2647
Duplicates found: 0
Execution time (seconds): 0.0095300674438477
So all I really did was sort the data set, keep track of the previous item, and not calculate permutations if the current item matched the previous. I also no longer have to pre calculate the amount of uniques and iterate through the permutations to check for duplicates. That made a world of difference.