Search code examples
summany-to-manyrelationship

SUM for 3rd party table in many to many relationship


I have a situation where I am bridging a 3rd party table between 2 other tables:

client (one-to-many) -> containers

containers -> containers_invoices (many-to-many) -> invoices

What I want is to get the SUM of the paid invoices for each client. The client is related to the containers, thus I have to connect the containers to the invoices and the clients to the containers to make the bridge. I used the following query to do so:

        $sql = "
        SELECT 
            SUM(invoices.invoice_eur) AS invoice_eur,
            SUM(invoices.invoice_usd) AS invoice_usd,
            invoices.status_id
        FROM containers_invoices
        LEFT JOIN invoices 
            ON containers_invoices.invoice_id = invoices.invoice_id
        LEFT JOIN containers 
            ON containers_invoices.container_id = containers.container_id
        WHERE containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
        GROUP BY containers.client_id
        ";
        
        $x = $this->fetch_query($sql);
        if (isset($x[0]->invoice_eur)) $eur = $x[0]->invoice_eur . ' EUR';
        if (isset($x[0]->invoice_usd)) $usd = $x[0]->invoice_usd . ' USD';
        if (isset($x[0]->invoice_eur) && isset($x[0]->invoice_usd)) $spacer = ' | ';

        return $eur . $spacer . $usd;

here is an example of how the invoices should look like:

invoice 1 -> cont A, cont B -> 100 USD
invoice 2 -> cont A, cont B -> 7000 USD
invoice 3 -> cont A, cont B -> 75 USD
invoice 4 -> cont A, cont B -> 7000 USD

invoice 5 -> cont C -> 1000 USD
invoice 6 -> cont D -> 1000 USD

The issue is that when one invoice is made for 2 or more containers, the sum is calculated for each individually. In the case of invoice 2 the query sees it as 14000 USD because there are 2 containers. The solution is to introduce DISTINCT before invoices.invoice_usd and that solves the doubling problem, but then this approach is too aggressive because then DISTINCT looks at invoice 2 and invoice 4 (7000 USD) and sees them as double as well, and thus it skips invoice 4. The same happens for invoice 5 and 6 (1000 USD).

Is there a possible solution to this? Thanks in advance!


Solution

  • Let's create the tables that are in the question here. For the sake of brevity, we'll use these abbreviations:

    clid -> client_id
    cid  -> container_id
    inv  -> invoice_id
    eur  -> invoice_eur
    usd  -> invoice_usd
    

    Let's create sample rows now:

    containers
    clid1, cid1
    clid1, cid2
    clid2, cid3
    
    container_invoices
    cid1, inv1
    cid1, inv2
    cid1, inv3
    cid2, inv1
    cid2, inv2
    cid3, inv3
    
    invoices
    inv1, eur1, usd1
    inv2, eur2, usd2
    inv3, eur3, usd3
    

    If we're to take the JOIN of these three tables on appropriate columns, a cross table that would be generated would look like this:

    containers_container_invoices_invoices
    clid1, cid1, inv1, eur1, usd1
    clid1, cid1, inv2, eur2, usd2
    clid1, cid1, inv3, eur3, usd3
    clid1, cid2, inv1, eur1, usd1
    clid1, cid2, inv2, eur2, usd2
    clid2, cid3, inv3, eur3, usd3
    

    As you can see in the table above, row 1 and row 4 are identical and row 2 and row 5 are identical as well. If we take the sum grouping only by clid, this would obviously double calculate the inv1 and inv2 entries for clid1. In order to only pick the legitimate values we can use DISTINCT. Here we would want this tuple (clid, inv) to be distinct (non-identical). Since eur and usd are columns of invoices. It is safe to assume we want (clid, inv, eur, usd) to be distinct (non-identical). So, the query would look like this:

    select sum(invoice_eur) as invoice_eur, sum(invoice_usd) as invoice_usd
    from (
        select distinct containers.client_id, invoices.invoice_id, invoices.invoice_eur, invoices.invoice_usd
        from container_invoices
        left join invoices 
        on container_invoices.invoice_id = invoices.invoice_id
        left join containers 
        on container_invoices.container_id = containers.container_id
        where containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
    ) client_invoices;
    

    Notice that you don't need to do a GROUP BY here since you already have a WHERE clause on client_id. See this DB Fiddle for more clarity.