I have this data:
+-------------+--------------------+-----------------------+
| employee_id | assignment_started | assignment_terminated |
+-------------+--------------------+-----------------------+
| 1 | 2018-07-01 | (NULL) |
| 2 | 2018-09-01 | (NULL) |
| 3 | 2018-10-13 | (NULL) |
| 4 | 2018-10-13 | (NULL) |
| 5 | 2018-10-15 | 2019-07-17 |
| 6 | 2018-11-01 | (NULL) |
| 7 | 2019-01-14 | (NULL) |
| 8 | 2019-01-24 | (NULL) |
| 9 | 2019-07-01 | 2019-07-30 |
+-------------+--------------------+-----------------------+
I want to count employees by month that are under assignment. To determine if the employee is under assignment, I need to check if the date I am looking for is between assigment_started and assignment terminated. But if assignmen_termiated is null I set it on NOW().
Also, I have a date range I need to check. So if I have a date range from 2018-01-01 until 2019-07-30 I need to count employees by every month and if there are no employees under assignment on some months I should have 0 value as count.
To create DATE RANGE MONTHS I use this code:
select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59'
What I get from this is:
+-------------+
| month_name |
+-------------+
|2018 January |
|2018 February|
|2018 March |
|2018 April |
| ... |
| ... |
| ... |
|2019 August |
+-------------+
From the data above you will see that until 2018 July I would have employee count 0, and for 2018 July I would have 1 employee, for example on 2018 September I should have count 5 because there are 5 employees that were working on that month.
To shorten the question, I use this code to achieve what I need, but for some reason count results are incorrect... I am trying to figure this out, but no idea why am I getting this results that you can find in fiddle below.
SELECT calendar.month_name, COUNT(employee_id) AS emp_count
FROM job_order_employees
RIGHT JOIN (select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59') as calendar
ON STR_TO_DATE(CONCAT(calendar.month_name,'01'),'%Y %M %d') BETWEEN job_order_employees.assignment_started AND IFNULL(job_order_employees.assignment_terminated,NOW())
GROUP BY calendar.month_name
ORDER BY STR_TO_DATE(calendar.month_name,'%Y %M')
Here is some sample data:
-- Dumping structure for table d-works-test.job_order_employees
CREATE TABLE IF NOT EXISTS `job_order_employees` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`employee_id` int(10) unsigned NOT NULL,
`assignment_started` date NOT NULL,
`assignment_terminated` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-- Dumping data for table d-works-test.job_order_employees: ~14 rows (approximately)
/*!40000 ALTER TABLE `job_order_employees` DISABLE KEYS */;
INSERT INTO `job_order_employees`
(`id`
, `employee_id`
,`assignment_started`
, `assignment_terminated`) VALUES
(1, 1,'2019-05-29', NULL),
(2, 2,'2018-09-19', NULL),
(3, 3,'2018-07-01', NULL),
(4, 4, '2018-10-13', NULL),
(5, 5, '2018-10-13', NULL),
(6, 6, '2019-02-01', NULL),
(7, 7, '2019-01-14', NULL),
(8, 8, '2018-11-01', NULL),
(9, 8, '2019-01-01', NULL),
(10, 9, '2019-02-01', NULL),
(11, 9, '2019-01-24', NULL),
(12, 9, '2018-12-31', NULL),
(13, 10, '2018-10-13', '2019-07-17'),
(14, 10, '2019-07-01', '2019-07-17');
And DB Fiddle of same: https://www.db-fiddle.com/f/8dUFx1DWiyypbkx9s2cYyG/1
Thank you in advance for your help!
You could probably simplify your logic quite a bit by holding off formatting the month strings until the final step; you're doing a lot of work back converting for something only relevant to the final formatting.
This will also help because then you can define the inclusive start and exclusive end of each month, like so:
SELECT adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM (stuff) v
Then, use that like so:
SELECT [format rangestart here], COUNT(employee_id) AS emp_count
FROM (
SELECT DISTINCT gen_date AS rangestart, gen_date + INTERVAL 1 MONTH AS rangeend
FROM v
WHERE gen_date BETWEEN '2018-01-01 00:00:00' AND'2019-08-31 23:59:59'
) as calendar
LEFT JOIN job_order_employees AS joe
ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.rangestart
AND joe.assignment_started <= calendar.rangeend
GROUP BY calendar.rangestart
ORDER BY calendar.rangestart
;
The join logic (overlap checking condition) looks a bit weird until you realize where it comes from. It's the simplification of "not ones that don't overlap".
NOT (ended < range_start || started > range_end)
simplifies to
ended >= range_start && started <= range_end
Edit: The above wrongly assumed the subquery was generating every month; the following should work
calendar query (this will cover about 83 years, you can add another t# table with a multiplier of 1000 to get 833 years worth):
SELECT '1970-01-01' + INTERVAL t0 + t1 * 10 + t2 * 100 MONTH AS start_date
, '1970-01-01' + INTERVAL 1 + t0 + t1 * 10 + t2 * 100 MONTH AS end_date
FROM (SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0
, (SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1
, (SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2
Final query:
SELECT [format calendar.start_date here]
, COUNT(employee_id) AS emp_count
FROM (
*calendar query above goes here*
) as calendar
LEFT JOIN job_order_employees AS joe
ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.start_date
AND joe.assignment_started < calendar.end_date
WHERE calendar.start_date BETWEEN '2018-01-01 00:00:00' AND '2019-08-31 23:59:59'
GROUP BY calendar.start_date
ORDER BY calendar.start_date
;
Note: I also changed the operators on the overlap comparison; since the generated end_date is non-inclusive, it should have been
NOT (ended < range_start || started >= range_end)
which simplifies to ended >= range_start && started < range_end