I have a situation where I am bridging a 3rd party table between 2 other tables:
client (one-to-many) -> containers
containers -> containers_invoices (many-to-many) -> invoices
What I want is to get the SUM of the paid invoices for each client. The client is related to the containers, thus I have to connect the containers to the invoices and the clients to the containers to make the bridge. I used the following query to do so:
$sql = "
SELECT
SUM(invoices.invoice_eur) AS invoice_eur,
SUM(invoices.invoice_usd) AS invoice_usd,
invoices.status_id
FROM containers_invoices
LEFT JOIN invoices
ON containers_invoices.invoice_id = invoices.invoice_id
LEFT JOIN containers
ON containers_invoices.container_id = containers.container_id
WHERE containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
GROUP BY containers.client_id
";
$x = $this->fetch_query($sql);
if (isset($x[0]->invoice_eur)) $eur = $x[0]->invoice_eur . ' EUR';
if (isset($x[0]->invoice_usd)) $usd = $x[0]->invoice_usd . ' USD';
if (isset($x[0]->invoice_eur) && isset($x[0]->invoice_usd)) $spacer = ' | ';
return $eur . $spacer . $usd;
here is an example of how the invoices should look like:
invoice 1 -> cont A, cont B -> 100 USD
invoice 2 -> cont A, cont B -> 7000 USD
invoice 3 -> cont A, cont B -> 75 USD
invoice 4 -> cont A, cont B -> 7000 USD
invoice 5 -> cont C -> 1000 USD
invoice 6 -> cont D -> 1000 USD
The issue is that when one invoice is made for 2 or more containers, the sum is calculated for each individually. In the case of invoice 2 the query sees it as 14000 USD because there are 2 containers. The solution is to introduce DISTINCT before invoices.invoice_usd and that solves the doubling problem, but then this approach is too aggressive because then DISTINCT looks at invoice 2 and invoice 4 (7000 USD) and sees them as double as well, and thus it skips invoice 4. The same happens for invoice 5 and 6 (1000 USD).
Is there a possible solution to this? Thanks in advance!
Let's create the tables that are in the question here. For the sake of brevity, we'll use these abbreviations:
clid -> client_id
cid -> container_id
inv -> invoice_id
eur -> invoice_eur
usd -> invoice_usd
Let's create sample rows now:
containers
clid1, cid1
clid1, cid2
clid2, cid3
container_invoices
cid1, inv1
cid1, inv2
cid1, inv3
cid2, inv1
cid2, inv2
cid3, inv3
invoices
inv1, eur1, usd1
inv2, eur2, usd2
inv3, eur3, usd3
If we're to take the JOIN
of these three tables on appropriate columns, a cross table that would be generated would look like this:
containers_container_invoices_invoices
clid1, cid1, inv1, eur1, usd1
clid1, cid1, inv2, eur2, usd2
clid1, cid1, inv3, eur3, usd3
clid1, cid2, inv1, eur1, usd1
clid1, cid2, inv2, eur2, usd2
clid2, cid3, inv3, eur3, usd3
As you can see in the table above, row 1
and row 4
are identical and row 2
and row 5
are identical as well. If we take the sum
group
ing only by clid
, this would obviously double calculate the inv1
and inv2
entries for clid1
. In order to only pick the legitimate values we can use DISTINCT
. Here we would want this tuple (clid, inv)
to be distinct (non-identical). Since eur
and usd
are columns of invoices
. It is safe to assume we want (clid, inv, eur, usd)
to be distinct (non-identical). So, the query would look like this:
select sum(invoice_eur) as invoice_eur, sum(invoice_usd) as invoice_usd
from (
select distinct containers.client_id, invoices.invoice_id, invoices.invoice_eur, invoices.invoice_usd
from container_invoices
left join invoices
on container_invoices.invoice_id = invoices.invoice_id
left join containers
on container_invoices.container_id = containers.container_id
where containers.client_id = ".$client_id." AND invoices.status_id = ".$invoice_status."
) client_invoices;
Notice that you don't need to do a GROUP BY
here since you already have a WHERE
clause on client_id
. See this DB Fiddle for more clarity.