Search code examples
sqlpostgresqldbeaver

Is there a way to manipulate string value that contains similar consecutive strings in SQL?


I have a dataset that looks like this:

enter image description here

Is there a way that I can manipulate the field 'Transition' to remove those label that made similar consecutive movement?

Objective:

For example, ID A123 made a movement of 'Import Review -> Evidence Collection -> Legal -> Legal -> Resolved'

The consecutive repetition was in Legal -> Legal, so goal is to remove the extra, so that the output would be 'Import Review -> Evidence Collection -> Legal -> Resolved'

Similiar to ID A125, the original movement was

 'Import Review -> Evidence Collection -> PDR -> IRO -> PDR -> PLS Analysis -> IRO -> IRO -> Legal -> Legal -> Import Review'

Expected output would be:

'Import Review -> Evidence Collection -> PDR -> IRO -> PDR -> PLS Analysis -> IRO -> Legal -> Import Review'

I am using these line of code currently:

select 
     ac."id_Parent_Record"
    ,ac."Field_Changed"
    ,STRING_AGG(t."group_title", ' -> ' ORDER BY ac."Changed_Date" asc) AS "Transitions"
    
from public.api_changelog ac

    left join public.task_workflow_group_title t 
    on t."task_workflow" = ac."Value_New"

where ac."Field_Changed" IN ('Task_Workflow')
    and ac."id_Parent_Record" IN ('A123', 'A124' , 'A125')
    
GROUP BY 
     ac."id_Parent_Record"
    ,ac."Field_Changed";

Solution

  • I would break each transition step into a separate row, eliminate the duplicates with a window function, and then reassemble the transitions column thusly:

    with breakout as (
      select *
        from api_changelog ac
             cross join lateral regexp_split_to_table(ac.transition, '\s*->\s*')
               with ordinality as t(stage, n)
    ), mark_dups as (
      select *, 
             stage = lag(stage) over (partition by id order by n) as dropme
        from breakout
    )
    select id, type, string_agg(stage, ' -> ' order by n) as transitions
      from mark_dups 
     where coalesce(dropme, false) != true
     group by id, type;
    

    Working fiddle here.