sql oracle algorithm range cellular-network

number range format conversion

Given a range of cellular phone numbers in a form of ['7810000000', '7819999999'] I need an algorithm that would generate multiple rows of smallest possible length which would entierely cover the given range if postfixed with '%'. For instance, the range from above would be represented as a single row '781'. In other words any number from the range can be represented as 781%. This representaton can be usefull for storing tarrifs, for instance. A single row in a database could be used to rate the entire range. There are many other tasks for which it's preferrable to have this format. An alghorithm for the range of ['526251630000','526251634999'] would yeild

For the range ['12300345','12367000'] we should get

We need this conversion to be done in Oracle(SQL/PLQSL). Any information or links would be highly appreciated. Thank you in advance.

Solution

Brute-force approach, with range start/end as bind variables:

var start_val varchar2(14);
var end_val varchar2(14);

exec :start_val := '526251630000';
exec :end_val := '526251634999';

with dns (dn) as (
  select to_number(:start_val) + level - 1
  from dual
  connect by level <= to_number(:end_val) - to_number(:start_val) + 1
),
exponents (ex) as (
  select level - 1 from dual
  connect by level <= length(:start_val)
),
tmp (dn, ex, pow, root, cnt ) as (
  select d.dn,
    e.ex,
    power(10, e.ex),
    trunc(d.dn / power(10, e.ex)),
    count(d.dn) over (partition by trunc(d.dn / power(10, e.ex)))
  from dns d
  cross join exponents e
)
select distinct to_char(min(root) keep (dense_rank first order by ex desc)
    over (partition by dn)) as result
from tmp
where pow = cnt
order by result;

which gets

RESULT                                  
----------------------------------------
526251630
526251631
526251632
526251633
526251634

The dns CTE expands the range into all the values it contains. The exponents CTE generates all possible powers of 10 you might want to use (i.e. how many digits to lop off the right of the range value). The tmp CTE then cross-joins those and counts how many individual numbers appear in each 'sub-range' calculated by dividing by a power of 10. The final query then filters on whole sub-ranges - where the number of values in that range is the same as the size of the sub-range, so there are no missed splits - and finds the smallest (i.e. shortest) sub-range root for each DN. And then finally gets the distinct values of those.

For the second range:

exec :start_val := '12300345';
exec :end_val := '12367000';

the same query gets:

RESULT                                  
----------------------------------------
12300345
12300346
12300347
12300348
12300349
1230035
1230036
1230037
1230038
1230039
123004
123005
123006
123007
123008
123009
12301
12302
12303
12304
12305
12306
12307
12308
12309
1231
1232
1233
1234
1235
12360
12361
12362
12363
12364
12365
12366
12367000

The CTEs and cross-join mean this is doing a lot of work, and it will slow down as the ranges get bigger - possibly to the point it fails.

I'm sure there are more efficient ways to do this and hopefully someone will come up with one, but this might get you started.