I am using HSQLDB for the database and have the following condition in which I have to avoid duplicate entries while joining 2 tables.
Table1
HMEXPENSE
+--------+---------------+-------------+
| USERID | EXPENSEAMOUNT | EXPENSEDATE |
+--------+---------------+-------------+
| a | 100 | 2018-10-10 |
| a | 200 | 2018-10-11 |
| a | 100 | 2018-10-11 |
| a | 200 | 2018-10-13 |
+--------+---------------+-------------+
Table2
HMINCOME
+--------+---------------+-------------+
| USERID | EXPENSEAMOUNT | EXPENSEDATE |
+--------+---------------+-------------+
| a | 200 | 2018-10-10 |
| a | 100 | 2018-10-11 |
| a | 200 | 2018-10-11 |
| a | 100 | 2018-10-12 |
+--------+---------------+-------------+
The current query which gives me the duplicate entries is as follows
SELECT e.expenseDate ,i.incomeDate , SUM(e.expenseAmount), SUM(i.incomeAmount)
FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate
GROUP BY i.incomeDate,e.expenseDate, i.incomeAmount, e.expenseAmount
OUTPUT
+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE | C3 | C4 |
+-------------+------------+-------+-------+
| 2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
| 2018-10-11 | 2018-10-11 | 200.0 | 100.0 |
| 2018-10-11 | 2018-10-11 | 100.0 | 100.0 |
| 2018-10-11 | 2018-10-11 | 200.0 | 200.0 |
| 2018-10-11 | 2018-10-11 | 100.0 | 200.0 |
| <null> | 2018-10-12 | <null>| 100.0 |
| 2018-10-13 | <null> | 200.0 | <null>|
+-------------+------------+-------+-------+
And if I use this above-mentioned query to get the actual output which is required in my actual scenario is as follows
SELECT e.expenseDate, i.incomeDate , SUM(e.expenseAmount),SUM(i.incomeAmount)
FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate
GROUP BY i.incomeDate,e.expenseDate
OUTPUT
+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE | C3 | C4 |
+-------------+------------+-------+-------+
| 2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
| 2018-10-11 | 2018-10-11 | 600.0 | 600.0 |
| <null> | 2018-10-12 | <null>| 100.0 |
| 2018-10-13 | <null> | 200.0 | <null>|
+-------------+------------+-------+-------+
The requirement is to get the sum of amount for a single day and null entry for the date which is not present in another table.
Expected output is as follows
+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE | C3 | C4 |
+-------------+------------+-------+-------+
| 2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
| 2018-10-11 | 2018-10-11 | 300.0 | 300.0 |
| <null> | 2018-10-12 | <null>| 100.0 |
| 2018-10-13 | <null> | 200.0 | <null>|
+-------------+------------+-------+-------+
C3 and C4 column values are not calculated correctly due to the duplicate entries.
Help...
One method to solve this uses union all
and group by
:
select dte, sum(incomeamount) as incomeamount, sum(expenseamount) as expenseamount
from ((select incomedate as dte, incomeamount, 0 as expenseamount
from hmincome
) union all
(select expensedate, 0, expenseAmount
from hmexpense
)
) ie
group by dte
order by dte;