Search code examples
sqlsql-serverpostgresqllateral-join

GROUP BY in CROSS APPLY


Let us have two tables

create table A (
  fkb int,
  groupby int
);

create table B (
  id int,
  search int
);

insert into A values (1, 1);
insert into B values (1, 1);
insert into B values (2, 1);

then the following query

select B.id, t.max_groupby - B.search diff
from B
cross apply ( 
  select max(A.groupby) max_groupby
  from A 
  where A.fkb = B.id 
) t

return the expected result as follows

id  diff
---------
1   0
2   NULL

However, when I add the group by A.fkb into the cross apply, the B row where the corresponding A.fkb does not exist, disappear.

select B.id, t.max_groupby - B.search diff
from B
cross apply ( 
  select max(A.groupby) max_groupby
  from A 
  where A.fkb = B.id 
  group by A.fkb
) t

I was testing on SQL Server as well as on PostgreSQL (with cross join lateral instead of cross apply). Why the group by makes the row disappear? It seems that the cross apply behaves as an outer join in the first case and as an inner join in the latter case. However, it is not clear to me why.


Solution

  • You can see this when you look at the result of the inner query separately:

    select max(A.groupby) max_groupby
    from A 
    where A.fkb = 2;
    

    returns a single row with max_groupby = null:

    max_groupby
    -----------
         (null)
    

    However as there is no row with A.fkb = 2 grouping by it yields an empty result which you can see when you run:

    select max(A.groupby) max_groupby
    from A 
    where A.fkb = 2
    group by A.fkb
    

    and thus the cross join does not return return rows for fkb = 2

    You need to use an outer join in order to include the row from B.

    In Postgres you would have to write this as:

    select B.id, t.max_groupby - B.search diff
    from B
      left join lateral ( 
        select max(A.groupby) max_groupby
        from A 
        where A.fkb = B.id 
        group by A.fkb
      ) t on true
    

    I don't know what the equivalent to left join lateral would be in SQL Server.
    on true would need to be written as on 1=1.