Search code examples
sqlpostgresqlcasegreatest-n-per-group

postgres - select the "highest scored" column_name from left to right


(please be patient with a self-taught but fascinated beginner in postgres :)

In POSTGRES 10 I have come up with a table to track the progress in editing and archiving photo folders. This progress is measured in sequential steps, which link via foreign_key to another table that basically states 0 = "open", 1 = "work", 2 = "done". Hence I use numeric values to reference these states.

To keep the question simple here are the most important columns with some demo data:

folder_name  |  archive_status  |  step_01  |  step_02  |  step_03  |  ...
----------------------------------------------------------------------------
john         |         *        |     0     |     2     |     0     |  ...
paul         |         *        |     0     |     0     |     2     |  ...
george       |         *        |     2     |     1     |     0     |  ...
ringo        |         *        |     0     |     2     |     0     |  ...

The desired result would be something like:

folder_name  |  archive_status  |  step_01  |  step_02  |  step_03  |  ...
----------------------------------------------------------------------------
john         |     step_02      |     0     |     2     |     0     |  ...
paul         |     step_03      |     0     |     0     |     2     |  ...
george       |     step_01      |     2     |     1     |     0     |  ...
ringo        |     step_02      |     0     |     2     |     0     |  ...

My two problems are:

  1. Considering my "steps" sequentially I would like to filter the "highest" ranked one (from left to right). So in the example above "john" has reached step_02", "paul has reached step_03" and so on. (Note that other values such as "george ... step_02 = 1" are irrelevant for this.)
  2. Can't wrap my head around the way to accomplish this within the same table upon update on any of the values - would I have to use FUNCTION or VIEW or TRIGGER (or a combination of them)?

I tried approaching it with the Aggregate Functions https://www.postgresql.org/docs/10/tutorial-agg.html but I am kind of stuck, since I don't need multiple input rows, but filtering multiple columns within a row.

--- UPDATE: New problem ---

Sorry for that, but my initial demo data was not clear enough, these columns act more as a progress tracker, where in fact EACH cell within these columns could show any value. (This is where the original solution in the answer fails, since if there are two same "highest" values in a row, it will use the first occurence.) So the SQL query should find the column at the farthest to the right so to speak. Of cource I could come up with some clever "score calculations" over here, but in fact things would be so much easier with this current "matrix style" design.

So again, the desired result (archive_status) according to the values (step_xx) would be:

folder_name  |  archive_status  |  step_01  |  step_02  |  step_03 
--------------------------------------------------------------------
john         |     step_02      |     2     |     2     |     0    
paul         |     step_03      |     1     |     1     |     2   
george       |     step_01      |     2     |     1     |     1   
ringo        |     step_02      |     2     |     2     |     1   

Solution

  • One option uses greatest() and a case expression:

    select t.*,
        case greatest(step_01, step_02, step_03)
            when step_01 then 'step_01'
            when step_02 then 'step_02'
            when step_03 then 'step_03'
        end as archive_status
    from mytable t
    

    While this would work for your current problem, I would recommend normalizing your design. Each step should be stored in a separate row rather than as a column, in a structure like (folder_name, step, status). Then you would use distinct on:

    select distinct on (folder_name) t.*
    from newtable t
    order by folder_name, status desc, step