Search code examples
postgresqlintervals

postgreSQL select interval and fill blanks


I'm working on a system to manage the problems in different projects.

I have the following tables:

Projects

id Description Country
1 3D experience Brazil
2 Lorem Epsum Chile

Problems

id idProject Description
1 1 Not loading
2 1 Breaking down

Problems_status

id idProblem Status Start_date End_date
1 1 Red 2020-10-17 2020-10-25
2 1 Yellow 2020-10-25 2020-11-20
3 1 Red 2020-11-20
4 2 Red 2020-11-01 2020-11-25
5 2 Yellow 2020-11-25 2020-12-22
6 2 Red 2020-12-22 2020-12-23
7 2 Green 2020-12-23

In the above examples, the problem 1 is still red, and the problem 2 is green (no end date).

I need to create a chart when the user selects an specific project, where the status of the problems along the weeks (starting by the week of the first registered problem) will be shown. The chart of the project 1 should look like this: enter image description here

I'm trying to write a code in postgreSQL to return a table like this, so that I can populate this chart:

Week Green Yellow Red
42/20 0 0 1
43/20 0 0 1
44/20 0 1 0
... ... ... ...
04/21 1 0 1

I've been trying multiple ways but just can't figure out how to do that, could someone help me please? Bellow a db-fiddle to help:

CREATE TABLE projects (
  id serial NOT NULL,
  description character varying(50) NOT NULL,
  country character varying(50) NOT NULL,
  CONSTRAINT projects_pkey PRIMARY KEY (id)
);

CREATE TABLE problems (
  id serial NOT NULL,
  id_project integer NOT NULL,
  description character varying(50) NOT NULL,
  CONSTRAINT problems_pkey PRIMARY KEY (id),
  CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
      REFERENCES projects (id) MATCH SIMPLE
);

CREATE TABLE problems_status (
  id serial NOT NULL,
  id_problem integer NOT NULL,
  status character varying(50) NOT NULL,
  start_date date NOT NULL,
  end_date date,
  CONSTRAINT problems_status_pkey PRIMARY KEY (id),
  CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
      REFERENCES problems (id) MATCH SIMPLE
);

INSERT INTO projects (description, country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description, country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project ,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project ,description) VALUES (1,'Breaking down');
INSERT INTO problems_status (id_problem, status, start_date, end_date) VALUES
(1, 'Red', '2020-10-17', '2020-10-25'),(1, 'Yellow', '2020-10-25', '2020-11-20'),
(1, 'Red', '2020-11-20', NULL),(2, 'Red', '2020-11-01', '2020-11-25'),
(2, 'Yellow', '2020-11-25', '2020-12-22'),(2, 'Red', '2020-12-22', '2020-12-23'),
(2, 'Green', '2020-12-23', NULL);

Solution

  • If I understood correctly your goal is to produce a weekly tally by problem status for a particular project for a specific time period (Min db date to current date). Further if a problem status spans week then is should be included in each weeks tally. That involve 2 time periods, the report period against the status start/end dates and checking for overlap of those dates. Now there ate 5 overlaps scenarios that need checking; lets call the ranges let A the any week in the report period and B. the start/end of status. Now, allowing that A must end within the reporting period. but B does not we have the following.

    • A starts, B starts, A ends, B ends. B overlaps end of A.
    • A starts, B starts, B ends, A ends. B totally contained within A.
    • B starts, A starts, B ends, A ends. B overlaps start of A.
    • B starts, A starts, A ends, B ends. A totally enclosed within B. Fortunately, Postgres provides functionally to handle all the above meaning the query does not have to handle the individual validations. This is DATERANGEs and the Overlap operator. The difficult work then becomes defining each week with in A. Then employ the Overlap operator on daterange for each week in A against the daterange for B (start_date, end_date). Then do conditional aggregation. for each overlap detected. See full example here.
    with  problem_list( problem_id ) as 
           -- identify the specific problem_ids desirded
           (select ps.id 
              from projects p
              join problems ps on(ps.id_project =  p.id)
             where p.id  = &selected_project
           )  --select * from problem_list;
          
      , report_period(srange, erange) as 
           -- generate the first day of week (Mon) for the
           -- oldest start date through day of week of Current_Date   
           (select min(first_of_week(ps.start_date))  
                 , first_of_week(current_date)
              from problem_status ps
              join problem_list pl 
                on (pl.problem_id = ps.id_problem)
           )  --select * from report_period; 
          
      , weekly_calendar(wk,yr, week_dates) as 
           -- expand the start, end date ranges to week dates (Mon-Sun) 
           -- and identify the week number with year
           (select extract( week from mon)::integer wk
                 , extract( isoyear from mon)::integer yr
                 , daterange(mon, mon+6, '[]'::text) wk_dates
              from (select generate_series(srange,erange, interval '7 days')::date mon
                      from  report_period
                   ) d
           )  -- select * from weekly_calendar; 
       , status_by_week(yr,wk,status) as   
         -- determine where problem start_date, end_date overlaps each calendar week
         -- then where multiple statuses exist for any week keep only the lat               
            ( select yr,wk,status  
                from (select  wc.yr,wc.wk,ps.status 
                     --   ,  ps.start_date, wc.week_dates,id_problem 
                          , row_number() over (partition by ps.id_problem,yr,wk order by yr, wk, start_date desc)  rn
                       from problem_status  ps 
                        join problem_list   pl on (pl.problem_id = ps.id_problem)
                        join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date))  -- actual overlap test  
                     ) ac
               where rn=1
            ) -- select * from status_by_week order by wk;
    select 'Project ' || p.id || ': ' || p.description Project
        , to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK"
        , "Red", "Yellow", "Green"
     from projects p
    cross join (select sbw.yr,sbw.wk 
                     , count(*) filter (where sbw.status = 'Red')    "Red"
                     , count(*) filter (where sbw.status = 'Yellow') "Yellow"
                     , count(*) filter (where sbw.status = 'Green')  "Green" 
                  from status_by_week sbw 
                 group by sbw.yr, sbw.wk
               ) sr
    where p.id  = &selected_project
    order by yr,wk;
    
    

    The CTEs and main operate as follows:

    • problem_list: Identifies the Problems (id_problem) related the specified project.

    • report_period: Identifies the full reporting period start to end.

    • weekly_calendar: Generates the beginning date (Mon) and ending date (Sun) for each week within the reporting period (A above). Along the way it also gathers week of the year and the ISO year.

    • status_by_week: This is the real work horse preforming two tasks. First is passes each problem by each of the week in the calendar. It builds row for each overlap detected. Then it enforces the "one status" rule.

    • Finally, the main select aggregates the status into the appropriate buckets and adds the syntactic sugar getting the Program Name.

    Note the function first_of_week(). This is a user defined function and available in the example and below. I created it some time ago and have found it useful. You are free to use it. But you do so without any claim of suitability or guaranty.

    create or replace
    function first_of_week(date_in date)
     returns date
    language sql
    immutable strict
    /*
     * Given a date return the first day of the week according to ISO-8601
     * 
     *    ISO-8601 Standard (in short) 
     *    1 All weeks begin on Monday.
     *    2 All Weeks have exactly 7 days.
     *    3 First week of any year is the Monday on or before 4-Jan.
     *      This implies that the last few days on Dec may be in the 
     *      first week of the following year and that the first few 
     *      days of Jan may be in week 53 (53) of the prior year.
     *      (Not at the same time obviously.)  
     *  
     */ 
    as $$
       with wk_adj(l_days) as (values  (array[0,1,2,3,4,5,6]))
       select date_in - l_days[ extract (isodow from date_in)::integer ]
         from wk_adj;
    $$;
    

    In the example I have implemented the query as a SQL function as it seems db<>fiddle has issues with bound variables and substitution variables, Besides it gave the ability to parameterize it. (Hate hard coded values). For the example I added additional data fro extra testing, Mostly as data that will not be selected. And an additional Status (what happens if it encounters something other than those 3 status values (in this case Pink). This easy to remove, just get rid on OTHER.


    Your notice that "the daterange is covering mon-mon, instead of mon-sun" is incorrect, although it would appear that way for someone not use to looking at them. Lets take week 43. If you queried the date range it would show [2020-10-19,2020-10-26) and yes both those dates are Monday. However, the bracketing characters have meaning. The leading character [ says the date is to included and the trailing character ) says the date is not to be included. A standard condition:

    somedate && [2020-10-19,2020-10-26) 
    is the same as
    somedate >= 2020-10-19 and somedate < 2020-10-26 
    

    This is why when you change the increment from "mon+6" to "mon+5" you fixed week 43, but introduced errors into other weeks.