Search code examples
sqlpostgresqlwindow-functionsgaps-in-data

Postgresql Fill Gaps - Matrix of latest available information for a given branch


I have a table with indexes of imported files, with dates and branches of each imported files.

Now I need to do a consolidation of multiple branches, so, that I have to duplicate the information from some branches when we have holidays, so that the data is consistent, basically I need to fill these gaps with the latest available information.

I tried doing some self-joins with ranking in order to shift between them and get the previous data, but it didn't work.

What I have is a table:

rundate, branch, imported 
2015-04-01, PL1, TRUE 
2015-04-01, ES1, TRUE 
2015-04-01, CZ4, TRUE 
2015-04-02, PL1, TRUE 
2015-04-02, ES1, TRUE 
2015-04-02, CZ4, TRUE   
2015-04-03, ES1, TRUE 
2015-04-03, CZ4, TRUE

In this example, I would like to make a query that returns:

gap_date, branch, real_date
2015-04-03, PL1, 2015-04-02

This table is quite small (couple thousand lines), so, performance shouldn't be a big issue.

Any idea on how can I achieve that?

Now I am using a function that receives the rundate and branch of the gap dates as parameters, and answers the latest before the date passed as parameter (using max(rundate) where rundate <= '$1')

Thanks!


Solution

  • you can use outer join, subquery and cross join:

    Schema:

    create table tbl(rundate date, 
                     branch varchar(10), 
                     imported bool);
    insert into tbl values('2015-04-01', 'PL1', TRUE),
    ('2015-04-01', 'ES1', TRUE), 
    ('2015-04-01', 'CZ4', TRUE), 
    ('2015-04-02', 'PL1', TRUE), 
    ('2015-04-02', 'ES1', TRUE), 
    ('2015-04-02', 'CZ4', TRUE),  
    ('2015-04-03', 'ES1', TRUE), 
    ('2015-04-03', 'CZ4', TRUE); 
    

    Query:

    select q.rundate as gap_date,q.branch,
           (select max(tt.rundate) 
            from tbl tt
            where tt.rundate<q.rundate and tt.branch=q.branch)
            as real_date
    from tbl t
    right outer join(
             select rundate,branch from (
                    select distinct rundate from tbl) t1
                    cross join (
                    select distinct branch from tbl)t2
              )q 
    on t.rundate=q.rundate and t.branch=q.branch
    where t.branch is null
    

    Result:

    gap_date    branch  real_date
    2015-04-03  PL1     2015-04-02