We have an interesting phenomenon with a sql and the oracle database that we could not reproduce. The example was simplified. We believe not, but possibly oversimplified.
Main question: Given a nested loop, where the inner (not driving) table has an analytic function, whose result is ambiguous (multiple rows could be the first row of the order by), would it be feasible that said analytic function can return different results for different outer loops?
Secondary Question: If yes, how can we reproduce this behaviour? If no, have you any other ideas why this query would produce multiple rows for the same company. Not the question: Should the assumption on what is wrong be correct, correcting the sql would be easy. Just make the order by in the analytic function unambiguous e.g. by adding the id column as second criteria.
Problem:
Company has a n:m relation to owner and a 1:n relation to address.
The SQL joins all tables while reading only a single address per company making use of the analytic function row_number(), groups by company AND address and accumulates the owner name.
We use the query for multiple purposes, other purposes involve reading the “best” address, the problematic one does not. We got multiple error reports with results like this:
Company A has owners N1, N2, N3.
Result was
Company | Owner list |
---|---|
A | N1 |
A | N2, N3 |
All cases that were reported involve companies with multiple “best” addresses, hence the theory, that somehow the subquery that should deliver a single address is broken. But we could not reproduce the result.
Full Details: (for smaller numbers the listagg() is the original function used, but it fails for bigger numbers. count(*) should be a suitable replacement)
--cleanup
DROP TABLE rau_companyowner;
DROP TABLE rau_owner;
DROP TABLE rau_address;
DROP TABLE rau_company;
--create structure
CREATE TABLE rau_company (
id NUMBER CONSTRAINT pk_rau_company PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_company_p ON rau_company(id))
);
CREATE TABLE rau_owner (
id NUMBER CONSTRAINT pk_rau_owner PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_owner_p ON rau_owner(id)),
name varchar2(1000)
);
CREATE TABLE rau_companyowner (
company_id NUMBER,
owner_id NUMBER,
CONSTRAINT pk_rau_companyowner PRIMARY KEY (company_id, owner_id) USING INDEX (CREATE UNIQUE INDEX idx_rau_companyowner_p ON rau_companyowner(company_id, owner_id)),
CONSTRAINT fk_companyowner_company FOREIGN KEY (company_id) REFERENCES rau_company(id),
CONSTRAINT fk_companyowner_owner FOREIGN KEY (owner_id) REFERENCES rau_owner(id)
);
CREATE TABLE rau_address (
id NUMBER CONSTRAINT pk_rau_address PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_address_p ON rau_address(id)),
company_id NUMBER,
prio NUMBER NOT NULL,
street varchar2(1000),
CONSTRAINT fk_address_company FOREIGN KEY (company_id) REFERENCES rau_company(id)
);
--create testdata
DECLARE
TYPE t_address IS TABLE OF rau_address%rowtype INDEX BY pls_integer;
address t_address;
TYPE t_owner IS TABLE OF rau_owner%rowtype INDEX BY pls_integer;
owner t_owner;
TYPE t_companyowner IS TABLE OF rau_companyowner%rowtype INDEX BY pls_integer;
companyowner t_companyowner;
ii pls_integer;
company_id pls_integer := 1;
test_count PLS_INTEGER := 10000;
--test_count PLS_INTEGER := 50;
BEGIN
--rau_company
INSERT INTO rau_company VALUES (company_id);
--rau_owner,rau_companyowner
FOR ii IN 1 .. test_count
LOOP
owner(ii).id:=ii;
owner(ii).name:='N'||to_char(ii);
companyowner(ii).company_id:=company_id;
companyowner(ii).owner_id:=ii;
END LOOP;
forall ii IN owner.FIRST .. owner.LAST
INSERT INTO rau_owner VALUES (owner(ii).id, owner(ii).name);
forall ii IN companyowner.FIRST .. companyowner.LAST
INSERT INTO rau_companyowner VALUES (companyowner(ii).company_id, companyowner(ii).owner_id);
--rau_address
FOR ii IN 1 .. test_count
LOOP
address(ii).id:=ii;
address(ii).company_id:=company_id;
address(ii).prio:=1;
address(ii).street:='S'||to_char(ii);
END LOOP;
forall ii IN address.FIRST .. address.LAST
INSERT INTO rau_address VALUES (address(ii).id, address(ii).company_id, address(ii).prio, address(ii).street);
COMMIT;
END;
-- check testdata
SELECT 'rau_company' tab, COUNT(*) count FROM rau_company
UNION all
SELECT 'rau_owner', COUNT(*) FROM rau_owner
UNION all
SELECT 'rau_companyowner', COUNT(*) FROM rau_companyowner
UNION all
SELECT 'rau_address', COUNT(*) FROM rau_address;
-- the sql: NL with address as inner loop enforced
-- ‘order BY prio’ is ambiguous because all addresses have the same prio
-- => the single row in ad could be any row
SELECT /*+ leading(hh hhoo oo ad) use_hash(hhoo oo) USE_NL(hh ad) */
hh.id company,
ad.street,
-- LISTAGG(oo.name || ', ') within group (order by oo.name) owner_list,
count(oo.id) owner_count
FROM rau_company hh
LEFT JOIN rau_companyowner hhoo ON hh.id = hhoo.company_id
LEFT JOIN rau_owner oo ON hhoo.owner_id = oo.id
LEFT JOIN (
SELECT *
FROM (
SELECT company_id, street,
row_number() over ( partition by company_id order BY prio asc ) as row_num
FROM rau_address
)
WHERE row_num = 1
) ad ON hh.id = ad.company_id
GROUP BY hh.id,
ad.street;
Cris Saxon was so nice to answer my question: https://asktom.oracle.com/pls/apex/f?p=100:11:::::P11_QUESTION_ID:9546263400346154452
In short: As long as the order by is ambiguous (non-deterministic), there will always be a chance for different results even within the same sql.
To reproduce add this to my test data: ALTER TABLE rau_address PARALLEL 8; and try the select at the bottom, it should deliver multiple rows.