sql postgresql postgresql-9.3 set-returning-functions

Compare result of two table functions using one column from each

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).

Sample definitions:

CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass, 
IN field1 integer) 
RETURNS TABLE(id integer, dist smallint)

Both functions return the exact same number of rows. Sample result (will be always ordered by dist):

(49,0)
(206022,3)
(206041,3)
(92233,4)

Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:

For example:

SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)

Returns something like:

(49,0)      (49,0)
(206022,3)  (206022,3)
(206041,3)  (206041,3)
(92233,4)   (133,4)

Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:

1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4

despite the fact that for the 4th row, 92233!=133

Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:

SELECT COUNT(*)
FROM 
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE  function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;

I am using PostgreSQL 9.3.

Solution

Is there a way to get only the 2nd field of each function result, to batch compare them?

All of the following answers assume that rows are returned in matching order.

Postgres 9.3

With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:

SELECT count(*) AS mismatches
FROM  (
   SELECT function1('tblp1','tblp2',49) AS f1
        , function2('tblp1_v2','tblp2_v2',49) AS f2
   ) sub
WHERE  (f1).dist <> (f2).dist;  -- note the parentheses!

The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.

This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).

Postgres 9.4

`WITH ORDINALITY` to generate row numbers on the fly

You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:

SELECT count(*) AS mismatches
FROM      function1('tblp1','tblp2',49)       WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE  f1.dist IS DISTINCT FROM f2.dist;

This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).

PostgreSQL unnest() with element number

`ROWS FROM` to join sets row-by-row

SELECT count(*) AS mismatches
FROM   ROWS FROM (function1('tblp1','tblp2',49)
                , function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE  t.dist1 IS DISTINCT FROM t.dist2;

Compare result of two table functions using one column from each

Postgres 9.3

Postgres 9.4

WITH ORDINALITY to generate row numbers on the fly

ROWS FROM to join sets row-by-row

`WITH ORDINALITY` to generate row numbers on the fly

`ROWS FROM` to join sets row-by-row