My database contains a table with unique id and JSON blob. JSON blob contains, among other fields, a list of "identifiers" that I have an index on. Identifiers are of a given type (ISIN, VALOR, etc.).
The identifier is held in a JSON subobject, with possibly a priority class (primary, or linked): {<type>: <identifier value>, <class>: true}
.
By having each identifier in its own subobject, the array of identifiers can hold multiple identifiers of the same type,
for example:
{
"identifiers": [
{"isin": "XS1", "primary": true},
{"valor": 1234, "linked": true},
{"isin": "EXTRA"}
],
"workflows": {},
"slots": {}
}
Previously I had a query that provided a single array element, to be searched among all JSON blobs "identifiers" lists and return the Ids of those elements. For example the search criteria: [ {"isin": "XS1"} ]
would return true
for the aforementioned example.
Now I want the priority class to filter more precisely.
Basically the conditions are as follows:
Return a record if it contains an identifier with the given type and value (of the search criteria):
primary = true
,linked = true
but the object holds no primary
identifier of that typeprimary
or linked
class (either having no class at all, or a different class), and there exists no primary
or linked
identifier of the searched type in the objectThus the search criteria [ {"isin": "EXTRA"} ]
would return false for below identifiers
'{
"identifiers": [
{"isin": "XS1", "primary": true},
{"valor": 1234},
{"isin": "EXTRA", "linked":true}
],
"workflows": {},
"slots": {}
}'
because isin
and EXTRA
match but there is already an isin
which is primary
, thus masking all other isin
s.
Query plan for this single item query was good, because i could easly hardcode certain values like "type" in the query itself. Now I would like to make the query resolve a batch of identifiers the same way, i.e.:
'[
[{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
[{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
[{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
]'
The query is: Fiddle
WITH scheme_and_id AS (
-- Convert the input JSON object into rows of key-value pairs
SELECT jsonb_array_elements('[
[{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}],
[{"valor": 1234}, {"id": 1234}, {"schema" : "valor"}],
[{"isin": "EXTRA"}, {"id": "EXTRA"}, {"schema" : "isin"}]
]'::jsonb) as batch_json
),
-- first build the JSON to be used to match the index
resolve_id as (select -- select the JSON objects
"id",
(blob -> 'identifiers') as "ids",
(scheme_and_id.batch_json[1] -> 'id')::text as identifier,
trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"') as selected_schema
from blobstable, scheme_and_id
where (blob -> 'identifiers') @> jsonb_build_array( scheme_and_id.batch_json[0])),
candidates as (
SELECT
"id",
selected_schema as "schema",
identifier as "identifier",
(single_identifier -> 'linked')::boolean as "linked",
(single_identifier -> 'primary')::boolean as "primary",
(single_identifier -> selected_schema)::TEXT as selected_id,
single_identifier
FROM resolve_id ,
LATERAL jsonb_array_elements("ids") single_identifier
where (single_identifier -> selected_schema)::TEXT is not null
)
SELECT *
from candidates
where ("primary" and identifier = selected_id) -- first rule
or ("linked" and identifier = selected_id and
not exists(select "id" from candidates where "primary")) -- second rule
or (("primary" is Null or False) and ("linked" is Null or False) and identifier = selected_id and
not exists(select "id" from candidates where "linked")) -- third rule
;
And the explanation: I provide arrays of params like so: [{"isin": "XS12"}, {"id": "XS12"}, {"schema" : "isin"}]
{"isin": "XS12"} is the identifier to be matched in:
where (blob -> 'identifiers') @> jsonb_build_array( scheme_and_id.batch_json[0]))
{"id": "XS12"} and {"schema" : "isin"} parameters are needed to easily match it in the candidates conditions (extracting this dynamically was challenging because the type name can be isin, valor etc, as seen here: (trim is used because somehow schema was being returned with quotation marks
(scheme_and_id.batch_json[1] -> 'id')::text as identifier,
trim((scheme_and_id.batch_json[2] -> 'schema')::text, '"')
After those operations "resolve_id" contains the "id" and "identifiers" blob of all arrays that contain identifiers from the "scheme_and_id"
And finally in "candidates" I extract needed fields from the blob for the final "where" logic. Addition of this "candidates" ads another nested loop to the explain plan. I am looking for a way to optimize this query. I was even thinking about making or the "where" logic using @> operator, to basically check if contains "type":"value" primary - true, and not contains others, but I see no way to do it like "not contains "isin": any, primary: true
Working example: Fiddle
Basic create table with test data, expected in the repsonse are all 3 selected ids
CREATE TABLE blobstable
(
id integer primary key,
blob jsonb
);
INSERT INTO blobstable
VALUES
(1, '{
"identifiers": [
{"isin": "XS1", "primary": true},
{"valor": 1234},
{"isin": "EXTRA"}
],
"workflows": {},
"slots": {}
}'),
(2, '{
"identifiers": [
{"isin": "XS3", "primary": true},
{"valor": 456, "linked": true},
{"cusip": "TRGYN"}
],
"workflows": {},
"slots": {}
}'),
(3, '{
"identifiers": [
{"isin": "XS12", "primary": true},
{"valor": 678},
{"isin": "XS5"}
],
"workflows": {},
"slots": {}
}')
;
I went with final query, as seen here: https://dbfiddle.uk/gwzGZX8j
WITH scheme_and_id as (
select (id_json -> 'identifier') AS queried_id,
trim((id_json-> 'scheme')::text, '"')AS scheme,
ordinal
from jsonb_array_elements('[
{"identifier": "XS12", "scheme" : "isin"},
{"identifier": 1234, "scheme" : "valor"},
{"identifier": "EXTRA", "scheme" : "isin"}
]'::jsonb) WITH ORDINALITY as f(id_json, ordinal)),
-- first build the JSON to be used to match the index
resolve_id as (select -- select the JSON objects
id,
(blob -> 'identifiers') as "ids",
ordinal,
queried_id,
scheme
from blobstable, scheme_and_id
where (blob -> 'identifiers') @> jsonb_build_array( jsonb_build_object(scheme, queried_id))),
candidates as (
SELECT
id,
ordinal,
scheme,
queried_id,
(identifier_row -> scheme) as candidate_id,
(identifier_row -> 'primary')::boolean as "primary",
(identifier_row -> 'linked')::boolean as "linked"
FROM resolve_id ,
LATERAL jsonb_array_elements("ids") identifier_row
)
SELECT id
from candidates c
right join scheme_and_id on c.ordinal = scheme_and_id.ordinal
and ((c."primary" and c.queried_id = c.candidate_id) -- first rule
or (c."linked" and c.queried_id = c.candidate_id
and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "primary")) -- second rule
or (("primary" is Null or False) and ("linked" is Null or False) and c.queried_id = c.candidate_id
and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "linked"))) -- third rule
order by scheme_and_id.ordinal;
So I added sorting, since I need to return NULL for not found identifiers. I also improved the way I provide the input into the query to drop the nested arrays and value duplication. I also decided to go with this where clause, because it moves all the conditional logic to the final step, and I believe is easier to read.
from candidates c
right join scheme_and_id on c.ordinal = scheme_and_id.ordinal
and ((c."primary" and c.queried_id = c.candidate_id) -- first rule
or (c."linked" and c.queried_id = c.candidate_id
and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "primary")) -- second rule
or (("primary" is Null or False) and ("linked" is Null or False) and c.queried_id = c.candidate_id
and not exists(select "id" from candidates where c.ordinal = candidates.ordinal and candidate_id is not null and "linked"))) -- third rule
order by scheme_and_id.ordinal;
Thank you for help