Search code examples
mysqlgroup-byunionsql-timestamp

MySQL union on two tables, one with timestamp and one with date


I am trying to put together a report for the performance of one advert showing the views and clicks that it's had on one day. The views and clicks are stored in separate tables with different structures so I think I have to perform a union.

I have read and understand this fantastic piece. It has helped me but I think this is one level more complex than the examples explained there. Would like some help from the community.

This is my views table, which stores a counter of the amount of views that an advert has had on one day.

+-------------+--------------+
| COLUMN_NAME | COLUMN_TYPE  |
+-------------+--------------+
| ad_day_id   | bigint(13)   |
| advert_id   | bigint(20)   |
| date        | date         |
| views       | mediumint(6) |
+-------------+--------------+

This is my clicks table, which stores every click individually. (some columns left out as they are not relevant to the question)

+-------------+---------------------+
| COLUMN_NAME | COLUMN_TYPE         |
+-------------+---------------------+
| id          | bigint(20) unsigned |
| advert_id   | bigint(20)          |
| timestamp   | timestamp           |
+-------------+---------------------+

The result should look like (no real numbers used, just to show the format):

+------------+-------+--------+
| event_date | views | clicks |
+------------+-------+--------+
| 2016-05-09 |    25 |      4 |
| 2016-05-10 |     2 |        |
| 2016-05-11 |   105 |     10 |
| 2016-05-13 |    96 |      7 |
| 2016-05-14 |       |      1 |
+------------+-------+--------+

Regarding the results:

  • Not every date will have clicks or views
  • Some date may have views and no clicks
  • Some date may have clicks and no views

Off to the code... this is what I currently have:

SELECT
    $views_table.date AS event_date,
    $views_table.views,
    '' AS clicks
FROM
    $views_table
WHERE
    ( $views_table.date BETWEEN '$from_date' AND '$to_date' )
    AND $views_table.advert_id=$advert_id
UNION
SELECT
    CAST($clicks_table.timestamp AS DATE) AS event_date,
    '' AS views,
    COUNT($clicks_table.advert_id) AS clicks
FROM
    $clicks_table
WHERE
    ( CAST($clicks_table.timestamp AS DATE) BETWEEN '$from_date' AND '$to_date' )
    AND $clicks_table.advert_id=$advert_id
GROUP BY
    event_date
ORDER BY
    event_date ASC;

Some notes on the code:

  • The clicks are stored individually on timestamp and therefore must be cast to dates and then grouped by date (or at least that is how I got valid results for a different report).
  • The report will feature a date range and is for one specific advert. This is explains the where clauses.

While composing this question I formatted the code a bit better and for reading ease I changed the order of the select statements which resolved my initial problem. Apparently both selects must feature the same columns AND be in the same order.

I think I am nearly there because this is my current result:

+------------+-------+--------+
| event_date | views | clicks |
+------------+-------+--------+
| 2016-05-09 | 1     |        |
| 2016-05-09 |       | 1      |
| 2016-05-10 | 2     |        |
| 2016-05-11 | 105   |        |
| 2016-05-11 |       | 7      |
| 2016-05-13 | 96    |        |
| 2016-05-13 |       | 16     |
| 2016-05-14 | 2     |        |
| 2016-05-14 |       | 1      |
| 2016-05-15 | 2     |        |
| 2016-05-15 |       | 2      |
+------------+-------+--------+

My remaining problem is with the duplicate dates. How can I resolve this?
Many thanks to those who are kind enough to answer!


Solution

  • I modified your query a little bit (see inline comments) and wrapped it in a subquery to use GROUP BY event_date in the outer query.

    SELECT event_date, MAX(views) AS views, MAX(clicks) AS clicks
    FROM (
        SELECT
            views.date AS event_date,
            views.views,
            0 AS clicks -- '' causes strange results on sqlfiddle
        FROM
            views
        WHERE
            ( views.date BETWEEN '2016-05-09' AND '2016-05-15' )
            AND views.advert_id=1
        UNION
        SELECT
            CAST(clicks.timestamp AS DATE) AS event_date,
            0 AS views, -- '' causes strange results on sqlfiddle
            COUNT(clicks.advert_id) AS clicks
        FROM
            clicks
        WHERE
            ( CAST(clicks.timestamp AS DATE) BETWEEN '2016-05-09' AND '2016-05-15' )
            AND clicks.advert_id=1
        GROUP BY
            event_date
        -- ORDER BY is useless here
    ) sub
    GROUP BY event_date
    ORDER BY event_date
    

    Demo

    Instead of CAST(clicks.timestamp AS DATE) you can also use DATE(clicks.timestamp) and hope that MySQL will use indexes in the future.