Search code examples
phparraysmultidimensional-arrayduplicatessubstr

Filtering multidimensional arrays by non-equal values to exclude duplicated records


Below are two arrays from two different feeds, they share different ids. Because of this, I have to rely on 'BriefTitle': I can tell by the 'BriefTitle' and other data (eg [LocationCountry], [StartDate], [Condition]) that this is same record. I would like to take substr of 'BriefTitle' to compare it to other 'BriefTitle' records to filter out duplicates, since they are contained in each other. I am not looking for an exact match, which is what I've been finding for most solutions here.

I like the short solution proposed by sevavietl/ mickmackusa: php remove duplicates from multidimensional array by value

$result = array_reverse(array_values(array_column(
    array_reverse($data),
    null,
    'BriefTitle'
)));

however, my 'BriefTitle' is an array (doesn't seem to work with array_column), and I am not sure how to apply substr function to the solution above.

Some quick notes:

  • Fortunately, [BriefTitle][0] is always the value to compare
  • If possible, I would like just grab the first instance for the data set, rejecting any following duplicates.

Any thoughts how I should approach this? The arrays:

 [0] => Array
        (
            [Rank] => 422
            [id] => Array
                (
                    [0] => 152091
                )

            [Condition] => Array
                (
                    [0] => Depression
                    [1] => Ketamine
                )

            [BriefTitle] => Array
                (
                    [0] => Positron Emission Tomography Assessment of Ketamine Binding of the Serotonin Transporter
                )

            [LocationCountry] => Array
                (
                    [0] => Austria
                )

            [StartDate] => Array
                (
                    [0] => May 5, 2016
                )

            [LastUpdatePostDate] => Array
                (
                    [0] => October 15, 2018
                )

            [Entheogen] => ketamine
            [Source] => clinicaltrials.gov
        )   


    [1] => Array
        (
            [Rank] => 6673
            [id] => Array
                (
                    [0] => YSBSZ18291
                )

            [Condition] => Array
                (
                    [0] => Depressive Disorder
                    [1] => Ketamine
                )

            [BriefTitle] => Array
                (
                    [0] => Positron Emission Tomography assessment of Ketamine Binding of the Serotonin Transporter and its Relevance for Rapid Antidepressant Response
                    [1] => Die Rolle des Serotonintransporters bei der akuten antidepressiven Wirkung von Ketamin, untersucht mit Positronen-Emissions-Tomographie
                )

            [LocationCountry] => Array
                (
                    [0] => Austria
                )

            [StartDate] => Array
                (
                    [0] => 2016 05 01
                )

            [LastUpdatePostDate] => Array
                (
                    [0] => 2018 10 15
                )

            [Entheogen] => ketamine
            [Source] => clinicaltrialsregister.eu
        )

Solution

  • Unfortunately because of the nature of your data (strings which match may be substrings of others, with different case) the only real option is to brute-force this. Loop over the array, storing titles as you go and checking whether the current title matches any of them:

    $result = array();
    $brieftitles = array();
    foreach ($array as $arr) {
        $foundtitle = false;
        $title = $arr['BriefTitle'][0];
        foreach ($brieftitles as $btitle) {
            $foundtitle = (stripos($title, $btitle) !== false) || (stripos($btitle, $title) !== false);
            if ($foundtitle) break;
        }
        if (!$foundtitle) {
            $result[] = $arr;
            $brieftitles[] = $arr['BriefTitle'][0];
        }
    }
    print_r($result);
    

    Demo on 3v4l.org