A blog allows comments on its entries. For our purpose here, all we need is two tables, writer
and comment
,
and all the fields we need are writer.id, writer.name, comment.id, comment.author_id (a foreign key) and comment.created (a datetime).
Question. Is there a MySQL request that will output the list of all three most prolific comment writers for each year, in the following format : three columns, the first is a year, the second is an author name, and the third is the number of comments by the given author in the given year. Thus, assuming there are enough writers and comments and unequal results, the number of rows in the output should be 3 times the number of years the blog has been in existence (three rows for each year).
If one only asks for statistics over a single year, this is easily done as follows :
SELECT YEAR(c.created), w.name, COUNT(c.id) as nbrOfComments
FROM comment AS c
INNER JOIN writer AS w
ON c.author_id = w.id
WHERE YEAR(c.created)='2021'
GROUP BY w.id
For testing purposes I include below some MySQL code to create and initialize the tables.
CREATE TABLE `writer` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(40) COLLATE utf8mb3_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=13 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `comment` (
`id` int NOT NULL AUTO_INCREMENT,
`content` TEXT COLLATE utf8mb3_unicode_ci NOT NULL,
`author_id` int NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY(author_id) REFERENCES writer(id)
) ENGINE=MyISAM AUTO_INCREMENT=13 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
INSERT INTO `writer` (`id`, `name`) VALUES
(1, 'Alf'),(2, 'Bob'),(3, 'Cathy'),(4, 'David'),
(5, 'Eric'),(6, 'Fanny'),(7, 'Gabriel'),(8, 'Hans'),
(9, 'Ibrahim'),(10, 'James'),(11, 'Kevin'),(12, 'Lena');
INSERT INTO `comment` (`id`, `author_id`, `content`, `created`) VALUES
(NULL, '1', 'some text here', '2021-01-22 06:40:31.000000'),
(NULL, '1', 'some text here', '2021-02-22 06:40:31.000000'),
(NULL, '1', 'some text here', '2021-03-22 06:40:31.000000'),
(NULL, '1', 'some text here', '2021-04-22 06:40:31.000000'),
(NULL, '2', 'some text here', '2021-01-22 06:40:31.000000'),
(NULL, '2', 'some text here', '2021-02-22 06:40:31.000000'),
(NULL, '2', 'some text here', '2021-03-22 06:40:31.000000'),
(NULL, '3', 'some text here', '2021-01-22 06:40:31.000000'),
(NULL, '3', 'some text here', '2021-02-22 06:40:31.000000'),
(NULL, '4', 'some text here', '2021-01-22 06:40:31.000000'),
(NULL, '5', 'some text here', '2022-01-22 06:40:31.000000'),
(NULL, '5', 'some text here', '2022-02-22 06:40:31.000000'),
(NULL, '5', 'some text here', '2022-03-22 06:40:31.000000'),
(NULL, '5', 'some text here', '2022-04-22 06:40:31.000000'),
(NULL, '6', 'some text here', '2022-01-22 06:40:31.000000'),
(NULL, '6', 'some text here', '2022-02-22 06:40:31.000000'),
(NULL, '6', 'some text here', '2022-03-22 06:40:31.000000'),
(NULL, '7', 'some text here', '2022-01-22 06:40:31.000000'),
(NULL, '7', 'some text here', '2022-02-22 06:40:31.000000'),
(NULL, '8', 'some text here', '2022-01-22 06:40:31.000000'),
(NULL, '9', 'some text here', '2023-01-22 06:40:31.000000'),
(NULL, '9', 'some text here', '2023-02-22 06:40:31.000000'),
(NULL, '9', 'some text here', '2023-03-22 06:40:31.000000'),
(NULL, '9', 'some text here', '2023-04-22 06:40:31.000000'),
(NULL, '10', 'some text here', '2023-01-22 06:40:31.000000'),
(NULL, '10', 'some text here', '2023-02-22 06:40:31.000000'),
(NULL, '10', 'some text here', '2023-03-22 06:40:31.000000'),
(NULL, '11', 'some text here', '2023-01-22 06:40:31.000000'),
(NULL, '11', 'some text here', '2023-02-22 06:40:31.000000'),
(NULL, '12', 'some text here', '2023-01-22 06:40:31.000000');
If you want the top 3 commenters per year over the last 3 years, I would recommend aggregation and window functions:
select *
from (
select year(c.created) year_created, w.name, count(*) as cnt_comments,
rank() over(partition by year(c.created) order by count(*) desc) rn
from comment as c
inner join writer as w
on c.author_id = w.id
where year(c.created) >= year(current_date) - 2
group by year(c.created), w.id
) c
where rn <= 3
order by year_created, cnt_comments desc
rank()
assigns the same index to ties, so this might yield more than 3 rows per year.
For your sample data, this produces:
year_created | name | cnt_comments | rn |
---|---|---|---|
2021 | Alf | 4 | 1 |
2021 | Bob | 3 | 2 |
2021 | Cathy | 2 | 3 |
2022 | Eric | 4 | 1 |
2022 | Fanny | 3 | 2 |
2022 | Gabriel | 2 | 3 |
2023 | Ibrahim | 4 | 1 |
2023 | James | 3 | 2 |
2023 | Kevin | 2 | 3 |