Search code examples
phparraysstrong-typingunify

PHP: Better algorithm to strict typed identify duplicates in an array


While looking for a solution to identify duplicates in an array I stumpled upon many kinds of solutions counting on array_count_values or array_unique. But all of these solutions doesn't care about objects in the array.

array_count_values throws an E_WARNING for every value which isn't a string or an integer.

array_unique does take care about elements with various types if the option SORT_REGULAR has been set. But take a look at the use case as follows.

class Foo
{
    private $value;

    public function __construct( $value )
    {
        $this->value = $value;
    }
}

$f1 = new Foo( 42 );
$f2 = $f1;
$f3 = new Foo( 42 );
$f4 = new Foo( '42' );
$f5 = new Foo( 'Bar' );
$a  = [ $f1, $f2, $f3, $f4, $f5 ];

After a unification with array_unqiue I expected to get an array with 4 elements [ $f1, $f3, $f4, $f5 ]. But it states out, that array_unqiue is working loose-typed and I got [ $f1, $f5 ] which isn't the result I need.

In my case I wrote a collection working like a set. I can pass some initial elements. These elements should be validated. If one element is a duplicate an exception have to be thrown. In order of the loose-typed array_unqiue I came up with this solution (which can be adapted very easy to unify an array).

$boundN = count( $elements );
$boundM = $boundN - 1;
for ( $m = 0; $m < $boundM; $m++ )
{
    for ( $n = $m + 1; $n < $boundN; $n++ )
    {
        if ( $elements[ $m ] === $elements[ $n ] )
        {
            throw new DuplicateElementException( 'The initial values contain duplicates.' );
        }
    }
}

At least I minified the iterations in the inner loop. One can assume, that all passed elements in the outer loop are validated and don't have to be validated again.

My question is: Is there a shorter algorithm equal to algorithms like Quick Search or something?


Solution

  • In your example, it's the specific instance of each object which is unique. The spl_object_id method can get a unique identifier for each object, and you can use those as keys in an associative array to collapse for duplicates. There are a few shorthand ways to write it, but a self-contained example might be:

    <?php
    class Foo {
        private $data;
    
        public function __construct($data) {
            $this -> data = $data;
        }
    }
    
    $f1 = new Foo( 42 );
    $f2 = $f1;
    $f3 = new Foo( 42 );
    $f4 = new Foo( '42' );
    $f5 = new Foo( 'Bar' );
    $a  = [ $f1, $f2, $f3, $f4, $f5 ];
    $b = obj_unique($a);
    
    print_r($b);
    
    function obj_unique(array $not_unique) {
        $tmp = [];
        foreach($not_unique as $value) {
          $tmp[spl_object_id($value)] = $value;
        }
        return array_values($tmp);
    }
    

    This creates the following output, which is missing the duplicate values.

    Array
    (
        [0] => Foo Object
            (
                [data:Foo:private] => 42
            )
    
        [1] => Foo Object
            (
                [data:Foo:private] => 42
            )
    
        [2] => Foo Object
            (
                [data:Foo:private] => 42
            )
    
        [3] => Foo Object
            (
                [data:Foo:private] => Bar
            )
    
    )
    

    This idea could be trivially modified to throw an exception if the array already contains the key.

    if(contains_duplicates($a)) {
        throw new Exception("Duplicates are bad etc etc ...");
    }
    
    function contains_duplicates(array $test) {
        $tmp = [];
        foreach($test as $value) {
          $key = spl_object_id($value);
          if(array_key_exists($key, $tmp)) {
              // duplicates
              return true;
          }
          $tmp[$key] = $value;
        }
        // no duplicates
        return false;
    }
    

    The === operator on an Object has the same behaviour as this. It is an instance-wise comparison, not a comparison of the contents of the object, which is something you should be aware of.