Search code examples
mysqlloggingunioncombinationsdata-analysis

Combining and merging data on different MySQL tables with the same columns into unique rows and running query to it


Here is the code that I run to analyse server logs on MySQL database:

SELECT YEAR(datetime), MONTH( datetime ), MIN(DATE(datetime)), MAX(DATE(datetime)), COUNT(DISTINCT (ip)), COUNT(ip), (COUNT(ip) / COUNT(DISTINCT (ip))) AS Ratio
FROM  `server_log_1` 
WHERE  `state` LIKE  'action'
AND  `user_id` LIKE  '9'
GROUP BY MONTH( datetime )
UNION
SELECT YEAR(datetime), MONTH( datetime ), MIN(DATE(datetime)), MAX(DATE(datetime)), COUNT(DISTINCT (ip)), COUNT(ip), (COUNT(ip) / COUNT(DISTINCT (ip))) AS Ratio
FROM  `server_log_2` 
WHERE  `state` LIKE  'action'
AND  `user_id` LIKE  '9'
GROUP BY MONTH( datetime )
UNION
SELECT YEAR(datetime), MONTH( datetime ), MIN(DATE(datetime)), MAX(DATE(datetime)), COUNT(DISTINCT (ip)), COUNT(ip), (COUNT(ip) / COUNT(DISTINCT (ip))) AS Ratio
FROM  `server_log_3` 
WHERE  `state` LIKE  'action'
AND  `user_id` LIKE  '9'
GROUP BY MONTH( datetime )

This gives me the result:

YEAR(datetime)  MONTH( datetime )   MIN(DATE(datetime)) MAX(DATE(datetime)) COUNT(DISTINCT (ip))    COUNT(ip)   Ratio
2015                12              2015-12-14              2015-12-30              16              20              1.2500
2016                1               2016-01-05              2016-01-27              15              20              1.3333
2016                2               2016-02-02              2016-02-29              27              36              1.3333
2016                3               2016-03-04              2016-03-29              24              32              1.3333
2016                4               2016-04-01              2016-04-08              5               8               1.6000
2016                4               2016-04-09              2016-04-29              19              27              1.4211
2016                5               2016-05-02              2016-05-28              21              31              1.4762
2016                6               2016-06-01              2016-06-30              28              34              1.2143
2016                7               2016-07-01              2016-07-20              14              16              1.1429
2016                7               2016-07-21              2016-07-21              1               1               1.0000

These are accurate results for each database however you see when a month is split into 2 different databases, (like 2016-4 and 2016-7) this causes 2 different rows to be generated for that month.

I want the these rows to be generated as a single row which has the sum of the values of the corresponding month. (only one row per month)

Also, simplify the query if possible.

And I'll be in trouble after 2016-12 where grouping by month will merge data from 2015-12 and 2016-12. How can I avoid that problem as well?

Could you write the correct SQL statement, please?


Solution

  • How about doing the union all before the group by:

    SELECT YEAR(datetime), MONTH(datetime), MIN(DATE(datetime)), MAX(DATE(datetime)), COUNT(DISTINCT (ip)), COUNT(ip), (COUNT(ip) / COUNT(DISTINCT (ip))) AS Ratio
    FROM (
        (SELECT datetime, ip FROM server_log_1 WHERE state = 'action' AND user_id = 9) UNION ALL
        (SELECT datetime, ip FROM server_log_2 WHERE state = 'action' AND user_id = 9) UNION ALL
        (SELECT datetime, ip FROM server_log_3 WHERE state = 'action' AND user_id = 9) 
    ) AS table_all
    GROUP BY YEAR(datetime), MONTH(datetime);
    

    In terms of performance, you want an index for each table on state, user_id (and perhaps adding datetime and ip).