Search code examples
mysqlreplacecoalesce

Mysql replace column value with other column value


I have 2 tables:

table: transaction:
====================
id  billed_date   amount
 1  2016-09-30      5
 2  2016-10-04      15
 3  2016-10-06      10

table: report_date
====================
transaction_id    report_date
      1            2016-10-01

I want:

  • Create a report which sum all transactions's amount in October 2016
  • Base on report date, not billed date
  • When report date is not set, it's base on billed_date
  • In above example, I want result is 30 (not 25)

Then I write:

The First:

SELECT 
   sum(t.amount), 
   CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS new_date
FROM 
   transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
WHERE new_date BETWEEN '2016-10-01' AND '2016-10-30'

The Second:

SELECT sum(amount) FROM 
 (SELECT t.amount,
    CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END AS date
    FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
 ) t
WHERE t.date BETWEEN '2016-10-01' AND '2016-10-30'

Result:

The First:

  • Unknown column 'new_date' in 'where clause'
  • If I replace 'new_date' by 'date': result = 25 (exclude id=1)

The Second:

  • result = 30 => Correct, but in my case, when transaction table have about 30k records, the process is too slow.

Anybody can help me?


Solution

  • First of all - the part

    CASE WHEN d.report_date IS NOT NULL THEN d.report_date ELSE t.billed_date END
    

    can be written shorter as

    COALESCE(d.report_date, t.billed_date)
    

    or as

    IFNULL(d.report_date, t.billed_date)
    

    In your first query you are using a column alias in the WHERE clause, wich is not allowed. You can fix it by moving the expression behind the alias to the WHERE clause:

    SELECT sum(t.amount)
    FROM transaction t LEFT JOIN report_date d ON t.id = d.transaction_id
    WHERE COALESCE(d.report_date, t.billed_date) BETWEEN '2016-10-01' AND '2016-10-30'
    

    This is almost the same as your own solution.

    Your second query is slow because MySQL has to store the subquery result (30K rows) into a temporary table. Trying to optimize it, you will end up with the same solution above.

    However if you have indexes on transaction.billed_date and report_date.report_date this query still can not use them. In order to use the indexes, you can split the query into two parts:

    Entries with a report (will use report_date.report_date index):

    SELECT sum(amount)
    FROM transaction t JOIN report_date d ON id = transaction_id 
    WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'
    

    Entries without a report (will use transaction.billed_date index):

    SELECT sum(amount)
    FROM transaction t LEFT JOIN report_date d ON id = transaction_id 
    WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'
    

    Both queries can use an index. You just need to sum the results, wich can also be done combining the two queries:

    SELECT (
        SELECT sum(amount)
        FROM transaction t JOIN report_date d ON id = transaction_id 
        WHERE d.report_date BETWEEN '2016-10-01' AND '2016-10-30'
    ) + (
        SELECT sum(amount)
        FROM transaction t LEFT JOIN report_date d ON id = transaction_id 
        WHERE d.report_date IS NULL AND t.billed_dateBETWEEN '2016-10-01' AND '2016-10-30'
    ) AS sum_amount