While looking for a solution to identify duplicates in an array I stumpled upon many kinds of solutions counting on array_count_values
or array_unique
. But all of these solutions doesn't care about objects in the array.
array_count_values
throws an E_WARNING
for every value which isn't a string
or an integer
.
array_unique
does take care about elements with various types if the option SORT_REGULAR
has been set. But take a look at the use case as follows.
class Foo
{
private $value;
public function __construct( $value )
{
$this->value = $value;
}
}
$f1 = new Foo( 42 );
$f2 = $f1;
$f3 = new Foo( 42 );
$f4 = new Foo( '42' );
$f5 = new Foo( 'Bar' );
$a = [ $f1, $f2, $f3, $f4, $f5 ];
After a unification with array_unqiue
I expected to get an array with 4 elements [ $f1, $f3, $f4, $f5 ]
. But it states out, that array_unqiue
is working loose-typed and I got [ $f1, $f5 ]
which isn't the result I need.
In my case I wrote a collection working like a set. I can pass some initial elements. These elements should be validated. If one element is a duplicate an exception have to be thrown. In order of the loose-typed array_unqiue
I came up with this solution (which can be adapted very easy to unify an array).
$boundN = count( $elements );
$boundM = $boundN - 1;
for ( $m = 0; $m < $boundM; $m++ )
{
for ( $n = $m + 1; $n < $boundN; $n++ )
{
if ( $elements[ $m ] === $elements[ $n ] )
{
throw new DuplicateElementException( 'The initial values contain duplicates.' );
}
}
}
At least I minified the iterations in the inner loop. One can assume, that all passed elements in the outer loop are validated and don't have to be validated again.
My question is: Is there a shorter algorithm equal to algorithms like Quick Search
or something?
In your example, it's the specific instance of each object which is unique. The spl_object_id method can get a unique identifier for each object, and you can use those as keys in an associative array to collapse for duplicates. There are a few shorthand ways to write it, but a self-contained example might be:
<?php
class Foo {
private $data;
public function __construct($data) {
$this -> data = $data;
}
}
$f1 = new Foo( 42 );
$f2 = $f1;
$f3 = new Foo( 42 );
$f4 = new Foo( '42' );
$f5 = new Foo( 'Bar' );
$a = [ $f1, $f2, $f3, $f4, $f5 ];
$b = obj_unique($a);
print_r($b);
function obj_unique(array $not_unique) {
$tmp = [];
foreach($not_unique as $value) {
$tmp[spl_object_id($value)] = $value;
}
return array_values($tmp);
}
This creates the following output, which is missing the duplicate values.
Array
(
[0] => Foo Object
(
[data:Foo:private] => 42
)
[1] => Foo Object
(
[data:Foo:private] => 42
)
[2] => Foo Object
(
[data:Foo:private] => 42
)
[3] => Foo Object
(
[data:Foo:private] => Bar
)
)
This idea could be trivially modified to throw an exception if the array already contains the key.
if(contains_duplicates($a)) {
throw new Exception("Duplicates are bad etc etc ...");
}
function contains_duplicates(array $test) {
$tmp = [];
foreach($test as $value) {
$key = spl_object_id($value);
if(array_key_exists($key, $tmp)) {
// duplicates
return true;
}
$tmp[$key] = $value;
}
// no duplicates
return false;
}
The ===
operator on an Object has the same behaviour as this. It is an instance-wise comparison, not a comparison of the contents of the object, which is something you should be aware of.