Search code examples
cakephpcakephp-3.x

Remove duplicate find() Results in CakePHP 3 and belongsToMany


Schema

`a`
|id  |
|a1  |

`b`
|id  |
|b1  |
|b2  |


`c`
|a_id|b_id|some_data|
|a1  |b1  |lorem ipsum|
|a1  |b1  |dolor|
|a1  |b2  |abc|

A belongsToMany B through C

Query

$this->As->find()->contain(['Bs' => function (Query $q) {
    return $q->distinct();
}]);

Executing the above query in CakePHP 3 will return A with three Bs because there are three rows in the table (b1 twice). Is there a way to remove duplicate results using the query builder? I've tried distinct() but that didn't work. I quess because _joinData is different, but I'm not sure.


Solution

  • distinct() without arguments will apply to all selected columns, ie it will produce something like:

    DISTINCT c.a_id, c.b_id, c.some_data, b.id
    

    applying the distinct over all columns, which will effectively not remove any of your duplicates, as all the tuples that consists of these columns will be different.

    You'd have to apply the distinct on specific columns only, so that there's actually a difference between your duplicates, for example use Bs.id:

    $q->distinct('Bs.id');
    

    This will either create a DISTINCT ON(Bs.id) or GROUP BY Bs.id, depending on the DBMS that you're using. Also depending on the DBMS that you're using and its configuration, GROUP BY will cause an error, as the query will select non-aggregated columns that aren't in the GROUP BY clause (see for example MySQL and the ONLY_FULL_GROUP_BY mode).

    Working around that limitation will require some trickery, one way I've used in the past is to use an explicit intermediary association on the join table, and select the n:m data in a separate query, the query for the intermediary association can then safely apply grouping.

    With your example that would be As hasMany Cs and Cs belongsTo Bs, and then contain including the join table, where you can apply the grouping, it would look something like this:

    $this->As
        ->find()
        ->contain([
            'Cs' => [
                'queryBuilder' => function (\Cake\ORM\Query $query) {
                    return $query
                        ->select(['Cs.a_id', 'Cs.b_id'])
                        ->group(['Cs.a_id', 'Cs.b_id']);
                },
                'Bs' => [
                    'strategy' => \Cake\ORM\Association\BelongsTo::STRATEGY_SELECT
                ]
            ]
        ]);
    

    The result is then of course formatted differently, so you might have to reformat it in case neccesary.