Search code examples
sqlpostgresqlcasesql-order-by

How does CASE in SELECT affects ORDER BY result


Consider the following schema:

    create table cities (
      id            serial  primary key,
      name  text    not null
    );

    insert into cities
      (name)
    values
      ('NY'),
      ('SF'),
      ('LA');

    create table users (
      id        serial  primary key,
      name      text    not null,
      city_id   int     not null    references cities (id)
    );

    insert into users
      (name, city_id)
    values
      ('Tom', 1),
      ('Anna', 1),
      ('Frank', 2);

And these queries:

--- A ---
with foo as (
    select c.name, count(u.id) as count from cities c
    left join users u on u.city_id=c.id
    group by c.name
)
select name, case when count > 0 then count else null end from foo
order by count;

--- B ---
with foo as (
    select c.name, count(u.id) as count from cities c
    left join users u on u.city_id=c.id
    group by c.name
)
select name, case count > 0 when true then count else null end from foo
order by count;

--- C ---
with foo as (
    select c.name, count(u.id) as count from cities c
    left join users u on u.city_id=c.id
    group by c.name
)
select name, case count when 0 then null else count end from foo
order by count;

Queries A and B (general and simple case forms) return null as a first element, while query C returns null last.

NULLS LAST also won't work on A and B while NULLS FIRST will work on C.

Why?


Solution

  • This has nothing to do with NULLS FIRST/LAST. NULLS FIRST is the default and you haven't specified it.

    None of your case columns are named. The docs say:

    If no output column name is specified using AS, the system assigns a default column name. For simple column references, this is the name of the referenced column. For function calls, this is the name of the function. For complex expressions, the system will generate a generic name.

    So in the first two, the column is called case, whereas in the latter it is called count. Exactly why that is is unclear as they are both expressions, but the name is implementation-specific and you should not rely on it.

    Since the first two queries have a case column, it is instead sorting by the base count column. Whereas in the third query it's sorting by the actual calculated case expression.

    You can see this in action in this fiddle.

    Note that since COUNT() never returns less than 0, you can simplify into nullif(count(u.id), 0) and only a single level rather than a CTE.