Search code examples
mysqlnestedsubqueryviewgreatest-n-per-group

Nested VIEW ignores ORDER BY


Question similar to this one: MySQL: View with Subquery in the FROM Clause Limitation

I have the following shows table:

DROP TABLE IF EXISTS `shows`;
CREATE TABLE `shows` (
  `show_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `show_type` int(11) unsigned DEFAULT NULL,
  `show_year` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`show_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `shows` VALUES
(NULL, 1, 2014), -- 1
(NULL, 1, 2015), -- 2
(NULL, 2, 2015), -- 3
(NULL, 2, 2014); -- 4

I want to create a VIEW that will return show_id for the highest show_year for each show_type. Here's a nested query that works - returns 2 and 3:

SELECT s.show_id, s.show_year
FROM ( -- subquery for applying ORDER BY before GROUP BY
    SELECT *
    FROM shows
    ORDER BY show_year DESC
) s
GROUP BY show_type;
/*
+---------+-----------+
| show_id | show_year |
+---------+-----------+
|       2 |      2015 |
|       3 |      2015 |
+---------+-----------+
*/

Just for the reference I also tried the following query that seemed natural to me at first, but it ended up being no good in my case as shown below:

SELECT s.show_id, MAX(s.show_year)
FROM shows s
GROUP BY show_type;
/*
+---------+------------------+
| show_id | MAX(s.show_year) |
+---------+------------------+
|       1 |             2015 | <== show_id=1 does NOT have show_year=2015
|       3 |             2015 |
+---------+------------------+
*/

Now creating a VIEW - based on the nested query above (the first SELECT) the problem is that a view will not accept a subquery.

So I am using two views instead. One inside of another.

The first one just sorts the table by show_year DESC:

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`%` SQL SECURITY DEFINER VIEW `shows_desc` AS
SELECT `s1`.`show_id` AS `show_id`,
    `s1`.`show_type` AS `show_type`,
    `s1`.`show_year` AS `show_year`
FROM `shows` `s1`
ORDER BY `s1`.`show_year` DESC;

The second one is supposed to do the GROUP BY on the first one:

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`%` SQL SECURITY DEFINER VIEW `shows_grouped` AS
SELECT `s2`.`show_id` AS `show_id`,
    `s2`.`show_year` AS `show_year`
FROM `shows_desc` `s2`
GROUP BY `s2`.`show_type`;

However, to my surprise it returns incorrect rows. As if it GROUPed ignoring the ORDER of the subview:

+---------+-----------+
| show_id | show_year |
+---------+-----------+
|       3 |      2015 |
|       1 |      2014 | <== why?
+---------+-----------+

Clues?

P.S.: SQL fiddle to poke: http://sqlfiddle.com/#!2/e506d4/5


Solution

  • Lew me focus on this:

    I want to create a VIEW that will return show_id for the highest show_year for each show_type. Here's a nested query that works - returns 2 and 3:

    Here is one way, assuming that the show_id increments so the largest show_id is in the latest year:

    select show_type, max(show_year) as show_year, max(show_id)
    from shows
    group by show_type;
    

    If not, try this:

    select show_type, max(show_year) as show_year,
           substring_index(group_concat(show_id order by show_year desc), ',', 1) as show_id
    from shows
    group by show_type;
    

    The reason your other queries don't work is because you have "wishful" understanding of how MySQL works. You are using an extension to MySQL that is explicitly said not to work in the documentation. That is, the columns in the select are not in aggregation functions and are not in the group by clause (and are not functionally dependent, but that is a very advanced concept that isn't relevant for this discussion):

    In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.

    EDIT:

    The following should also work for a view:

    select s.*
    from shows s
    where s.show_year = (select max(s2.show_year) from shows s2 where s2.show_type = s.show_type);
    

    EDIT II:

    If you want only one row for each show_type and show_id is unique, then this should work:

    select s.*
    from shows
    where not exists (select 1
                      from shows s2
                      where s2.show_type = s.show_type and
                            (s2.show_year > s.show_year or
                             s2.show_year = s.show_year and s2.show_id > s.show_id
                            )
                     );