Search code examples
mysqlcakephpgroup-bydistinctcakephp-3.x

Query matching() causing duplicate rows and distinct() not working


I am trying to filter one table Payments by a field on the associated table Invoices.

Using the function matching() on the query object filters correctly but causes duplicate rows. It seemed like the solution was using distinct(), but calling distinct(Payments.id) results in an invalid query. I'm doing the following in a controller action.

$conditions = [
    'Payments.is_deleted =' => false
];
$args = [
    'conditions' => $conditions,
    'contain' => ['Invoices', 'Invoices.Clients'],
];
$payments = $this->Payments->find('all', $args);
if($issuer) {
    // This causes duplicate rows
    $payments->matching('Invoices', function ($q) use ($issuer) {
        return $q->where(['Invoices.issuer_id' => $issuer['id']]);
    });
    // $payments->distinct('Payments.id'); // Causes a mysql error
}

Am I correct in thinking that distinct() is what I need, and if so any idea what's missing to make it work?

I'm getting the following mysql error when uncommenting the line above:

Error: SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #8 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'InvoicesPayments.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Full query:

SELECT
    PAYMENTS.ID AS `PAYMENTS__ID`,
    PAYMENTS.CREATED AS `PAYMENTS__CREATED`,
    PAYMENTS.MODIFIED AS `PAYMENTS__MODIFIED`,
    PAYMENTS.DATE_REGISTERED AS `PAYMENTS__DATE_REGISTERED`,
    PAYMENTS.USER_ID AS `PAYMENTS__USER_ID`,
    PAYMENTS.AMOUNT AS `PAYMENTS__AMOUNT`,
    PAYMENTS.IS_DELETED AS `PAYMENTS__IS_DELETED`,
    INVOICESPAYMENTS.ID AS `INVOICESPAYMENTS__ID`,
    INVOICESPAYMENTS.INVOICE_ID AS `INVOICESPAYMENTS__INVOICE_ID`,
    INVOICESPAYMENTS.PAYMENT_ID AS `INVOICESPAYMENTS__PAYMENT_ID`,
    INVOICESPAYMENTS.PART_AMOUNT AS `INVOICESPAYMENTS__PART_AMOUNT`,
    INVOICES.ID AS `INVOICES__ID`,
    INVOICES.CREATED AS `INVOICES__CREATED`,
    INVOICES.MODIFIED AS `INVOICES__MODIFIED`,
    INVOICES.IS_PAID AS `INVOICES__IS_PAID`,
    INVOICES.IS_DELETED AS `INVOICES__IS_DELETED`,
    INVOICES.CLIENT_ID AS `INVOICES__CLIENT_ID`,
    INVOICES.ISSUER_ID AS `INVOICES__ISSUER_ID`,
    INVOICES.NUMBER AS `INVOICES__NUMBER`,
    INVOICES.SUBTOTAL AS `INVOICES__SUBTOTAL`,
    INVOICES.TOTAL AS `INVOICES__TOTAL`,
    INVOICES.DATE_REGISTERED AS `INVOICES__DATE_REGISTERED`,
    INVOICES.CURRENCY AS `INVOICES__CURRENCY`,
    INVOICES.RECEIVER_NAME AS `INVOICES__RECEIVER_NAME`,
    INVOICES.RECEIVER_RFC AS `INVOICES__RECEIVER_RFC`,
    INVOICES.EMAIL_SENDER AS `INVOICES__EMAIL_SENDER`,
    INVOICES.PDF_PATH AS `INVOICES__PDF_PATH` 
FROM
    PAYMENTS PAYMENTS 
    INNER JOIN
        INVOICES_PAYMENTS INVOICESPAYMENTS 
        ON PAYMENTS.ID = (
            INVOICESPAYMENTS.PAYMENT_ID
        )
    INNER JOIN
        INVOICES INVOICES 
        ON (
            INVOICES.ISSUER_ID = :C0 
            AND INVOICES.ID = (
                INVOICESPAYMENTS.INVOICE_ID
            )
        ) 
WHERE
    (
        PAYMENTS.IS_DELETED = :C1 
        AND PAYMENTS.DATE_REGISTERED >= :C2 
        AND PAYMENTS.DATE_REGISTERED <= :C3
    )
GROUP BY
    PAYMENT_ID 
ORDER BY
    PAYMENTS.DATE_REGISTERED ASC

Solution

  • That behavior is expected, as matching will use an INNER join, and yes, grouping is how you avoid duplicates:

    As this function will create an INNER JOIN, you might want to consider calling distinct on the find query as you might get duplicate rows if your conditions don’t exclude them already. This might be the case, for example, when the same users comments more than once on a single article.

    Cookbook > Database Access & ORM > Query Builder > Loading Associations > Filtering by Associated Data

    As the error message states, your MySQL server is configured to use the strict only_full_group_by mode, where your query is invalid. You can either disable that strict mode as mentioned by Akash prajapati (which can come with its own problems, as MySQL is then allowed to pretty much pick values of a group at random), or you could change how you query things in order to conform to the strict mode.

    In your case where you need to group on the primary key, you could simply switch to using innerJoinWith() instead, unlike matching() this will not add any fields of that association to the SELECT list, and things should be fine in strict mode, as everything else is functionally dependent:

    In cases where you would group on a key that would break functional dependency detection, one way to solve that could for example be to use a subquery for filtering, one that only selects that key, something along the lines of this:

    $conditions = [
        'Payments.is_deleted =' => false
    ];
    
    $payments = $this->Payments
        ->find()
        ->contain(['Invoices.Clients']);
    
    if($issuer) {
        $matcherQuery = $this->Payments
            ->find()
            ->select(['Payments.some_other_field'])
            ->where($conditions)
            ->matching('Invoices', function ($q) use ($issuer) {
                return $q->where(['Invoices.issuer_id' => $issuer['id']]);
            })
            ->distinct('Payments.some_other_field');
    
        $payments->where([
            'Payments.some_other_field IN' => $matcherQuery
        ]);
    } else {
        $payments->where($conditions);
    }
    

    This will result in a query similar to this, where the outer query can then select all the fields you want:

    SELECT
        ...
    FROM
        payments
    WHERE
        payments.some_other_field IN (
            SELECT
                payments.some_other_field
            FROM
                payments
            INNER JOIN
                invoices_payments ON
                    payments.id = invoices_payments.payment_id
            INNER JOIN
                invoices ON
                    invoices.issuer_id = ...
                    AND
                    invoices.id = invoices_payments.invoice_id
            WHERE
                payments.is_deleted = ...
            GROUP BY
                payments.some_other_field
        )