Search code examples
mysqlsqlsqlitesumset-theory

SQL discrepancy between addition of columns and summing of a columns with NULL values


When I add two columns v and w together the results of the rows give NULL if one or both summand are NULL. This makes sense. But if I sum over the column with SUM(v) there will be a numeric result even there are NULLs. Is this due to practical convience? Or is there a theoretical justification?

Here is an example to clarify (MySQL and sqlite):

CREATE TABLE x0 (
  id    INTEGER 
, v     DOUBLE
, w     DOUBLE
);

INSERT INTO x0 VALUES
  (1,    1,     1)
, (2,    1,     1)
, (3, NULL,     1)
, (4,    1,  NULL)
, (5, NULL,  NULL)
;

-- NULL if summand is NULL
SELECT v+w
FROM x0
;
-- v+w
-- 2
-- 2
-- \N
-- \N
-- \N

-- NULL if summand is NULL
SELECT id, v+w, SUM(v+w)
FROM x0
GROUP BY id
;
-- id   v+w SUM(v+w)
-- 1    2   2
-- 2    2   2
-- 3    \N  \N
-- 4    \N  \N
-- 5    \N  \N


-- There is numeric results even if some summands are NULL    
SELECT SUM(v), SUM(w)
FROM x0
;
-- SUM(v)   SUM(w)
-- 3        3

Solution

  • ISO/IEC 9075:1992 Database Language SQL says in section 6.5 (<set function specification>), General Rules, 1) b):

    let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values.

    And that's all it says. So aggregate functions must ignore NULLs, but, officially, there is no justification.

    This behaviour goes back to the first SQL implementations (IBM System R, Oracle V2), where the designers probably thought this would be a good idea. And later, nobody wanted to break backwards compatibility.

    In this aspect, as in others, SQL's handling of NULL values is inconsistent, and sometimes not portable.