Search code examples
postgresqlcomposite

How to complete missing rows from a table with rows from another table in postgres?


I have 2 action tables, one specific, one general, based on a status and related actions. In the first table, some rows (based on status) are missing. I am trying to return a global table that would pick the missing rows from the second table (default_action) whenever there would be a row missing in the first one.

Job table: job_actions
Default table: default_actions

I am using the following set for testing:

CREATE TYPE STATUS AS ENUM('In Progress', 'Failed', 'Completed');

CREATE TYPE EXPIRATION_ACTION AS ENUM('Expire', 'Delete');

CREATE TYPE BASIC_ACTION AS (status STATUS,
    operation EXPIRATION_ACTION, expiration_time TIMESTAMP);
   
  
CREATE TYPE ACTION AS (partition VARCHAR(40), job VARCHAR(48), b_action BASIC_ACTION);


CREATE TABLE IF NOT EXISTS job_actions (
  partition VARCHAR(40),
  job VARCHAR(48),
  status STATUS,
  operation EXPIRATION_ACTION,
  expiration_time TIMESTAMP
  );

CREATE TABLE IF NOT EXISTS default_actions OF BASIC_ACTION;
INSERT INTO default_actions (
       status,
       operation,
       expiration_time
       )
VALUES ('In Progress', 'Expire', 'infinity'::timestamp),
      ('Failed', 'Expire', 'infinity'::timestamp),
      ('Completed', 'Expire', 'infinity'::timestamp);

INSERT INTO job_actions (
   partition ,
   job ,
   status,
   operation,
   expiration_time
    )   
  VALUES 
     ('part1', 'job1','Failed', 'Expire', NOW() + INTERVAL '1 hour'),
     ('part1', 'job2','In Progress', 'Expire', NOW() + INTERVAL '1     hour'),
     ('part1', 'job2','Failed', 'Expire', NOW() + INTERVAL '1 hour'),
     ('part1', 'job3','In Progress', 'Expire', NOW() + INTERVAL '1 hour'),
     ('part1', 'job3','Failed', 'Expire', NOW() + INTERVAL '1 hour');  

I am trying to use something like

SELECT ja.partition, ja.job, ja.status, ja.operation, ja.expiration_time
FROM job_actions ja 
WHERE NOT EXISTS (
SELECT da.status, da.operation, da.expiration_time
FROM default_actions da );

But at the moment, it returns an empty table. Here is the expected result:

enter image description here
Would anyone know what I am doing wrong?


Solution

  • First, get all partitions and jobs from job_actions. Then cross join with default_actions to get all possible combinations. Left join that with job_actions and take the expiration_time from there unless it is a NULL value (no matching row was found).

    Translated into SQL:

    SELECT partition, job, status, operation,
           coalesce(ja.expiration_time, da.expiration_time) AS expiration_time
    FROM (SELECT DISTINCT partition, job
          FROM job_actions) AS jobs
       CROSS JOIN default_actions AS da
       LEFT JOIN job_actions AS ja USING (partition, job, status, operation)
    
    ORDER BY partition, job, status;
     partition │ job  │   status    │ operation │      expiration_time       
    ═══════════╪══════╪═════════════╪═══════════╪════════════════════════════
     part1     │ job1 │ In Progress │ Expire    │ infinity
     part1     │ job1 │ Failed      │ Expire    │ 2021-06-18 14:57:23.912874
     part1     │ job1 │ Completed   │ Expire    │ infinity
     part1     │ job2 │ In Progress │ Expire    │ 2021-06-18 14:57:23.912874
     part1     │ job2 │ Failed      │ Expire    │ 2021-06-18 14:57:23.912874
     part1     │ job2 │ Completed   │ Expire    │ infinity
     part1     │ job3 │ In Progress │ Expire    │ 2021-06-18 14:57:23.912874
     part1     │ job3 │ Failed      │ Expire    │ 2021-06-18 14:57:23.912874
     part1     │ job3 │ Completed   │ Expire    │ infinity
    (9 rows)