Search code examples
pythonsqlcasepython-polarsduckdb

Polars SQL CASE


Is this a bug, non-conformant behavior, or standardized behavior? A Polars SQL statement is calculating the average of values based on a condition. The CASE WHEN doesn't include an ELSE because those values should be ignored. Polars complains that an ELSE is required. If I include an ELSE, with no value, it's a syntax error. The solution is to use ELSE NULL. For comparison, duckdb doesn't required an ELSE. Should I open an issue on github? Is ELSE NULL the conforming solution? Or is duckdb giving me a break?

SELECT AVG(CASE WHEN A <= B OR A <= C THEN D END) FROM df

Solution

  • Is this a bug, non-conformant behavior, or standardized behavior?

    Fun-fact: zero RDBMS today implement the ISO SQL specification de-jure. In my mind the spec is both aspirational but also something that no-one should actually conform to (because ISO SQL is just so horribly unergonomic, and the ISO itself can be better-described as an economic rent-seeking operation disguised as a standards organization.

    Editorials aside... I can cite the actual ISO SQL spec to answer your question because I actually spent my own money buying a criminally overpriced copy of it (and yes, alcohol was involved).


    Under US copyright's fair-use exception, I think I can get-away with sharing a screenshot of the relevant spec pages:

    Egregious copyright violation that will spell the end of civilization

    Yet more egregious copyright violations that threatens the
livelihoods of rent-seeking institutional gatekeepers who actually add zero value to the economy


    The CASE WHEN doesn't include an ELSE because those values should be ignored.

    Omiting an ELSE does not mean anything gets "ignore"; it's just shorthand for ELSE NULL.

    Quoteth the Holy Word of the Sacred ISO:

    6.12 <case expression>, Syntax Rules: Point 4 (page 252 of the 2023 edition):
    If an <else clause> is not specified, then ELSE NULL is implicit


    Polars complains that an ELSE is required

    I found the bit in Polars that complains. and git blame located this commit which embued Polars with support for CASE WHEN ELSE only recently in 2023, and it looks like the requirement for an ELSE branch is indeed a shortcoming given it's a hard, absolute requirement - but Polars has loads of other similar limitations: right in the same commit we also see errors like "CASE operand is not yet supported" (which rejects CASE operands classified as "some") and more besides seen in this related commit, such as an admission that their JOIN clauses support only = and AND (i.e. equijoins, but not other kinds of joins).

    ...so Polars' SQL support (as of 2023) is somewhat lacking.

    If I include an ELSE, with no value, it's a syntax error. The solution is to use ELSE NULL

    Correct. You should use an explicit CASE WHEN [...] ELSE NULL END. The ISO SQL spec says it's equivalent to a CASE expression that omits the ELSE NULL so you have nothing to worry about, especially because AVG will ignore NULLs:

    Quoteth the Ferengi robber-barons of the ISO:

    (emphasis mine; in the context of an expression like SELECT AVG( <value expression> ) FROM T1)

    10.9 <aggregate function>, General Rules: Point 7 (page 814 of the 2023 edition):
    Let TX be the single-column table that is the result of applying the <value expression> to each row of T1 and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning — null value eliminated in set function (01003).


    In conclusion: Polars' current SQL implementation is a fraction of a subset of a minority of a portion of the ISO SQL specification, with noticable (but not unduly bothersome) deviations from the spec.

    Should I open an issue on github?

    Probably not: it's only a bug if the behaviour violates Polars' own project specifications or if they claim to closely or strictly implement ISO SQL (which they are not claiming, btw).

    Is ELSE NULL the conforming solution?

    Yes, and you should continue to use ELSE NULL and not worry about it. It's good-practice, imo anyway (I always explicitly specify ELSE NULL myself, unless it's unnecessarily verbose).

    Or is duckdb giving me a break?

    DuckDB is hardly conforming to the ISO SQL spec either; DuckDB explicitly says their implementation is based on Postgres' dialect and therefore is not based on ISO SQL.