According the instructions here I have created two functions that use EXECUTE FORMAT
and return the same table of (int,smallint)
.
Sample definitions:
CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
Both functions return the exact same number of rows. Sample result (will be always ordered by dist):
(49,0)
(206022,3)
(206041,3)
(92233,4)
Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:
For example:
SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)
Returns something like:
(49,0) (49,0)
(206022,3) (206022,3)
(206041,3) (206041,3)
(92233,4) (133,4)
Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:
1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4
despite the fact that for the 4th row, 92233!=133
Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:
SELECT COUNT(*)
FROM
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;
I am using PostgreSQL 9.3.
Is there a way to get only the 2nd field of each function result, to batch compare them?
All of the following answers assume that rows are returned in matching order.
With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:
SELECT count(*) AS mismatches
FROM (
SELECT function1('tblp1','tblp2',49) AS f1
, function2('tblp1_v2','tblp2_v2',49) AS f2
) sub
WHERE (f1).dist <> (f2).dist; -- note the parentheses!
The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.
This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).
WITH ORDINALITY
to generate row numbers on the flyYou can use WITH ORDINALITY
to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT
list:
SELECT count(*) AS mismatches
FROM function1('tblp1','tblp2',49) WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE f1.dist IS DISTINCT FROM f2.dist;
This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).
Related:
ROWS FROM
to join sets row-by-rowSELECT count(*) AS mismatches
FROM ROWS FROM (function1('tblp1','tblp2',49)
, function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE t.dist1 IS DISTINCT FROM t.dist2;
Related answer:
Aside:
EXECUTE FORMAT
is not a set plpgsql functionality. RETURN QUERY
is. format()
is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.