Search code examples
oracle-databasematch-recognize

How to use match_recognize to solve the N threshold requirement


I want to know if there is a way we can do the below task in SQL. Can we do this with match_ recognize?

enter image description here

Above is the input table. I want to group the role for a specific region where No. of employees is less than 6. So, In Africa, we have to combine Analyst, S.Analyst and manager (2+3+9) to overcome the threshold value of 6. Partner and S. Manager must be combined in India (2+7). S. Analyst can be left as it is. We have to drop the USA or add some flag that we can identify using a where clause. So the output should look like below.

enter image description here


Solution

  • So, I think I've got it now. As mentioned in my original answer, my new approach is to get all group combinations. I am working with IDs, and as your table doesn't seem to have them, my first step is to genrate them. In the same step I am numbering the rows per region, so as to start building combinations with only one row per region in the recursive part to save time. (I will get duplicate combinations, like (1,2,3) and (1,3,2), but at least avoid the additional (2,1,3), (2,3,1), (3,1,2), (3,2,1)).

    I am using an ID string to get the combinations. Once a group is complete with at least 6 employees, I start a new group after a semicolon. So, with IDs 1, 2 and 3, I might end up with groups '1,2,3' (all IDs in one group), '1,2;3' (1 and 2 in one group, 3 in another) and many more, because with the recursive query I also generate incomplete groups like '1,2'. I will later find the complete groups by looking at the digits: '1,2,3' and '1,2;3' have three digits, while '1,2' only has two, so it cannot be complete. As a separator I am not using the comma, but a #, which helped me a bit when looking at the generated rows, as ',' and ';' look very similar. I also not only separate the IDs with #, but also have an # at the beginning and the end. This helps with finding an ID in the string. If I look for 1, I look for '#1#' actually, because I don't want to find the 1 in '12' or '21'. And I will find it in a string like '#1#2#;#3#', because the 1 is surrounded by #.

    Once I have all combinations, I detect the best combination per region with a rather complicated ranking. At last I join the original rows to the groups and aggregate.

    with
      data (id, region, role, employees, rn) as
      (
        select
          row_number() over (order by region, role, employees),
          region, role, employees,
          row_number() over (partition by region order by employees, role)
        from mytable
      ),
      combinations (region, ids, employees) as
      (
        select region, cast('#' || id || '#' as varchar2(400 byte)), employees
        from data
        where rn = 1
        union all
        select 
          g.region,
          case when g.employees >= 6 
              then g.ids || ';#' || d.id || '#'
              else g.ids || d.id || '#'
          end,
          case when g.employees >= 6 
               then d.employees
               else d.employees + g.employees
          end
        from combinations g
        join data d on d.region = g.region
                   and g.ids not like '%#' || d.id || '#%'
      ),
      best_combinations (region, ids, employees) as
      (
        select region, ids, employees
        from combinations
        order by rank() over (partition by region -- per region
                              order by regexp_count(ids, '\d') desc, -- all IDs in combinations
                                       case when employees >= 6 then 1 else 2 end, -- no incomplete combinations left if possible 
                                       regexp_count(ids, ';') desc, -- as many combinations as possible
                                       ids) -- in order to pick only one row per region in case of ties
        fetch first row with ties
      ),
      final_groups(region, ids, rest) as
      (
        select region, regexp_substr(ids, '^[^;]+'), regexp_substr(ids, ';(.*$)', 1, 1, 'i', 1)
        from best_combinations
        union all
        select region, regexp_substr(rest, '^[^;]+'), regexp_substr(rest, ';(.*$)', 1, 1, 'i', 1)
        from final_groups
        where rest is not null
      )
    select 
      bgs.region,
      listagg(d.role, ', ') within group (order by d.role) as roles,
      sum(d.employees) as employees
    from data d
    join final_groups bgs on bgs.ids like '%#' || d.id || '#%'
    group by bgs.region, bgs.ids
    order by bgs.region, bgs.ids;
    

    Demo: https://dbfiddle.uk/a0x1Z9rd