Search code examples
sqlhsqldb

Avoid duplicate entries using full JOIN with SUM and GROUP BY


I am using HSQLDB for the database and have the following condition in which I have to avoid duplicate entries while joining 2 tables.

Table1

HMEXPENSE
+--------+---------------+-------------+
| USERID | EXPENSEAMOUNT | EXPENSEDATE |
+--------+---------------+-------------+
|      a |      100      | 2018-10-10  |
|      a |      200      | 2018-10-11  |
|      a |      100      | 2018-10-11  |
|      a |      200      | 2018-10-13  |
+--------+---------------+-------------+

Table2

HMINCOME
+--------+---------------+-------------+
| USERID | EXPENSEAMOUNT | EXPENSEDATE |
+--------+---------------+-------------+
|      a |      200      | 2018-10-10  |
|      a |      100      | 2018-10-11  |
|      a |      200      | 2018-10-11  |
|      a |      100      | 2018-10-12  |
+--------+---------------+-------------+

The current query which gives me the duplicate entries is as follows

SELECT e.expenseDate ,i.incomeDate , SUM(e.expenseAmount), SUM(i.incomeAmount)
FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate 
GROUP BY i.incomeDate,e.expenseDate, i.incomeAmount, e.expenseAmount

OUTPUT

+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE |   C3  |   C4  |
+-------------+------------+-------+-------+
|  2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
|  2018-10-11 | 2018-10-11 | 200.0 | 100.0 |
|  2018-10-11 | 2018-10-11 | 100.0 | 100.0 |
|  2018-10-11 | 2018-10-11 | 200.0 | 200.0 |
|  2018-10-11 | 2018-10-11 | 100.0 | 200.0 |
|   <null>    | 2018-10-12 | <null>| 100.0 |
|  2018-10-13 |   <null>   | 200.0 | <null>|
+-------------+------------+-------+-------+

And if I use this above-mentioned query to get the actual output which is required in my actual scenario is as follows

SELECT e.expenseDate, i.incomeDate , SUM(e.expenseAmount),SUM(i.incomeAmount)
FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate 
GROUP BY i.incomeDate,e.expenseDate

OUTPUT

+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE |   C3  |   C4  |
+-------------+------------+-------+-------+
|  2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
|  2018-10-11 | 2018-10-11 | 600.0 | 600.0 |
|   <null>    | 2018-10-12 | <null>| 100.0 |
|  2018-10-13 |   <null>   | 200.0 | <null>|
+-------------+------------+-------+-------+

The requirement is to get the sum of amount for a single day and null entry for the date which is not present in another table.

Expected output is as follows

+-------------+------------+-------+-------+
| EXPENSEDATE | INCOMEDATE |   C3  |   C4  |
+-------------+------------+-------+-------+
|  2018-10-10 | 2018-10-10 | 100.0 | 200.0 |
|  2018-10-11 | 2018-10-11 | 300.0 | 300.0 |
|   <null>    | 2018-10-12 | <null>| 100.0 |
|  2018-10-13 |   <null>   | 200.0 | <null>|
+-------------+------------+-------+-------+

C3 and C4 column values are not calculated correctly due to the duplicate entries.

Help...


Solution

  • One method to solve this uses union all and group by:

    select dte, sum(incomeamount) as incomeamount, sum(expenseamount) as expenseamount
    from ((select incomedate as dte, incomeamount, 0 as expenseamount
           from hmincome
          ) union all
          (select expensedate, 0, expenseAmount
           from hmexpense
          )
         ) ie
    group by dte
    order by dte;