Search code examples
sqlpostgresqlpostgresql-12

filter emails and names and then de-duplicate in two columns using JSON on PostgreSQL 12


I have emails table that has sender and reporter columns. I want to search given parameter in those columns and return unique values.

Let me explain with sample. This is my table and records:

CREATE TABLE public.emails (
  id                bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
    (MAXVALUE 9223372036854775807),
  sender            jsonb NOT NULL,
  reporter      jsonb not null
);

insert into emails (sender, reporter) VALUES ('[{"email": "[email protected]", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "[email protected]", "name": "dennis1"}]', '[{"email": "[email protected]", "name": "john"}, {"email": "[email protected]", "name": "dennis1"}, {"email": "[email protected]", "name": "dennis2"}]');
insert into emails (sender, reporter) VALUES ('[{"email": "[email protected]", "name": "dennis1"}]', '[]');
insert into emails (sender, reporter) VALUES ('[{"email": "[email protected]", "name": "dennis1"}]', '[]');

I want to get email addresses and names. I also want to avoid dupes. Only one email and one name. I also don't want to get it as array, instead, one email and name per one row.

Searching john

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* 'john' or "e"."name" ~* 'john'))
);

Expected Result for john:

email                 name
[email protected]      john

Searching for `` (empty):

SELECT
* /* i don't know what to put here pr merge with reporters */
FROM "emails" AS "e"
WHERE (EXISTS (SELECT
    *
  FROM JSONB_ARRAY_ELEMENTS_TEXT("e"."sender") AS "e" ("email")
  WHERE ("e"."email" ~* '' or "e"."name" ~* ''))
);

Expected Result for `` (empty):

email                 name
[email protected]      john
[email protected]   dennis1
[email protected]   dennis2

dennis2 is in both sender and reporter, therefore, only one of them is required. No dupes.

In fact, in here there is one catch. If sender or reporter columns has at least one json object (not json array) then this query also fails.

Error: cannot extract elements from an object

Which is an another story, tho.

How can i achieve my goal in this case?

Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=1bf9c5f83f5104e2392c31984cb4e939


Solution

  • Normalize your data before searching, then remove duplicates using distinct on () clause:

    with cte as (select x ->> 'name' as name, x ->> 'email' as email
    from emails as e, jsonb_array_elements(e.sender || e.reporter) as x)
    select distinct on (email) * from cte where 
      name ~* '' or email ~* ''
      --name ~* 'john' or email ~* 'john'
    order by email;
    

    Demo

    Note that it will always scan the whole table, no indexes applicable in this case. Think about schema normalization.