I have no control over the format of the data I am trying to process. I could, of course, use a scripting language to deal with the following problem outside of the database, but I would like to avoid that because of the amount of data I am dealing with and because I'd like to eliminate the necessity of manual steps.
In short, I have a table of lists. A list might consist of a single 3-digit string, more than one 3-digit strings, a range of 3-digit strings, e.g. 012-018
, or a number of 3-digit strings and ranges of 3-digit strings. For example:
drop table list;
drop table lists;
create table lists (id varchar, vals varchar);
insert into lists values('A', '001,003-005');
insert into lists values('B', '008-007');
insert into lists values('C', '010, 011, 012');
insert into lists values('D', '011-013, 016-018, 020');
I know, I know.
I would like to turn this into the following table:
create table list (id varchar, val varchar);
A 001 A 003 A 004 A 005 B 008 B 007 C 010 C 011 C 012 D 011 D 012 D 013 D 016 D 017 D 018 D 020
Is there any way to do this in SQL?
Since you haven't tagged your question with a specific RDBMS, I'll have to answer generally.
SQL itself doesn't provide the basic operation that you're looking for, which is basically a string split. This means that you'll have to write your own, or use one of the many that have been published online.
You've complicated matters a bit, though, with the ranges that you have in your data. This means that your procedure is going to look something like this:
','
.'-'
(which, for non-range elements, should return you a single result).'-'
) yields one result, it's a single record that you can insert into your final destination. If it yields two results, then it's a range and you'll have to iterate from the start to the finish (using elements 1 and 2 of that split) and insert records into your final destinationEdit after comment
Unfortunately, I don't have any familiarity with PROC SQL or SAS, so I can't provide a specific solution for that. I can post something below in SQL Server T-SQL, which should hopefully get you started.
declare @results table (idx int identity(1, 1), id varchar(5), data varchar(max))
declare @elements table (idx int identity(1, 1), element varchar(25))
declare @range table (idx int identity(1, 1), element varchar(25))
insert into @results (id, data)
select
your_id,
your_data
from your_source
declare @i int
declare @cnt int
declare @j int
declare @cnt2 int
declare @element varchar(25)
declare @first int
declare @second int
declare @start int
declare @end int
declare @id varchar(5)
declare @data varchar(max)
select @i = min(idx) - 1, @cnt = max(idx) from @results
while @i < @cnt
begin
select @i = @i + 1
select @id = id, @data = data from @results where idx = @i
delete @elements
insert into @elements (element)
select
element
from split(@data, ',')
select @j = min(idx) - 1, @cnt2 = max(idx) from @elements
while @j < @cnt2
begin
select @j = @j + 1
select @element = element from @elements where idx = @j
delete @range
insert into @range (element)
select
element
from split(@element, '-')
select @first = min(idx), @second = max(idx) from @range
if @first = @second --single element
insert into final_destination (id, value)
select
@id,
element
from @range
else if @second - @first = 1 -- two elements, as desired
begin
select @start = convert(int, element) - 1 from @range where idx = @first
select @end = convert(int, element) from @range where idx = @second
while @start < @end
begin
select @start = @start + 1
insert into final_destination (id, value)
values (@id, @start)
end
end
else -- error condition, bad input
end
end