Search code examples
regexoracle12cregexp-substr

Using regexp_substr on tab delimited record with spaces in fields


Using Oracle 12c, how can I use regexp_substr to separate a tab delimited record whose fields may contain spaces? The record has four fields. The third field contains words with spaces.

I'm using this as my reference:Oracle Regex

Here's my query

with tab_delimited_record as
 (select 'Field1    Field2  This is field3 and contains spaces  Field4' as record_with_fields from dual) 
 select record_with_fields,
        regexp_substr('\S+',1,3) as field3a, -- Expect ==>This is field3...
        regexp_substr('\t+',1,3) as field3b, -- Expect==>This is field3...
        regexp_substr('[[::space::]]+',1,3) as field_3c -- Another version
  from  tab_delimited_record

Desired Results

RECORD_WITH_FIELDS

Field1 Field2 This is field3 and contains spaces Field4

FIELD3

This is field3 and contains spaces


Solution

  • I believe you are looking for something like this. Note this example returns all fields for example's sake but of course you can just select field3 if that's all you need. The CTE builds string with tab-delimited fields. The query then uses regex_substr to get the nth (4th argument) string of characters followed by a TAB or the end of the line.

    with tab_delimited_record(record_with_fields) as (
      select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' from dual
    ) 
    select record_with_fields,
           regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 1, null, 1) as field_1, 
           regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 2, null, 1) as field_2, 
           regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 3, null, 1) as field_3,
           regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 4, null, 1) as field_4
    from  tab_delimited_record;