I have a dataset that looks like this:
Is there a way that I can manipulate the field 'Transition' to remove those label that made similar consecutive movement?
Objective:
For example, ID A123 made a movement of 'Import Review -> Evidence Collection -> Legal -> Legal -> Resolved'
The consecutive repetition was in Legal -> Legal, so goal is to remove the extra, so that the output would be 'Import Review -> Evidence Collection -> Legal -> Resolved'
Similiar to ID A125, the original movement was
'Import Review -> Evidence Collection -> PDR -> IRO -> PDR -> PLS Analysis -> IRO -> IRO -> Legal -> Legal -> Import Review'
Expected output would be:
'Import Review -> Evidence Collection -> PDR -> IRO -> PDR -> PLS Analysis -> IRO -> Legal -> Import Review'
I am using these line of code currently:
select
ac."id_Parent_Record"
,ac."Field_Changed"
,STRING_AGG(t."group_title", ' -> ' ORDER BY ac."Changed_Date" asc) AS "Transitions"
from public.api_changelog ac
left join public.task_workflow_group_title t
on t."task_workflow" = ac."Value_New"
where ac."Field_Changed" IN ('Task_Workflow')
and ac."id_Parent_Record" IN ('A123', 'A124' , 'A125')
GROUP BY
ac."id_Parent_Record"
,ac."Field_Changed";
I would break each transition step into a separate row, eliminate the duplicates with a window function, and then reassemble the transitions
column thusly:
with breakout as (
select *
from api_changelog ac
cross join lateral regexp_split_to_table(ac.transition, '\s*->\s*')
with ordinality as t(stage, n)
), mark_dups as (
select *,
stage = lag(stage) over (partition by id order by n) as dropme
from breakout
)
select id, type, string_agg(stage, ' -> ' order by n) as transitions
from mark_dups
where coalesce(dropme, false) != true
group by id, type;
Working fiddle here.