Search code examples
sqloracle-databasesubstringinstr

Oracle SQL - how to split up a string into parts but not break up words


I need to split a long description into 4 parts of up to 60 characters each, but I need each part to end with a space (not break up any words). What is the best method to do this? Below is the snippet of the code, however I need it to not break up words.

SELECT emplid as SHRQPNM_PIDM,
       substr(listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr), 1, 60),
       substr(listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr), 61, 60),
       substr(listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr), 121, 60),
       substr(listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr), 181, 60),
       to_char(sysdate, 'YYYYMMDD') SHRQPNM_ACTIVITY_D,
FROM   PS_TSCRPT_TEXT
GROUP BY emplid, print_loc_SEQ) 
order by emplid, print_loc_SEQ;

example of current output:

DOCTORAL THESIS -A MEASUREMENT OF DISTORTION PRODUCT **OTOACOU|STIC** EMISSIONS WHILE ATTENDINGTO FAMILIAR AND UNFAMILIAR **VIS|UAL** STIMULATION|

Solution

  • If you want to strip the trailing space from the output then you can use:

    SELECT emplid as SHRQPNM_PIDM,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,60})( |$)',
             1,
             1,
             NULL,
             1
           ) AS part1,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,60})( |$)',
             1,
             2,
             NULL,
             1
           ) AS part2,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,60})( |$)',
             1,
             3,
             NULL,
             1
           ) AS part3,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,60})( |$)',
             1,
             4,
             NULL,
             1
           ) AS part4,
           to_char(sysdate, 'YYYYMMDD') SHRQPNM_ACTIVITY_D
    FROM   PS_TSCRPT_TEXT
    GROUP BY emplid, print_loc_SEQ
    order by emplid, print_loc_SEQ;
    

    Which, for the sample data:

    CREATE TABLE ps_tscrpt_text (emplid, ssr_transcript_txt, text_seq_nbr, print_loc_seq) AS
    SELECT 1,
           'DOCTORAL THESIS',
           1,
           1
    FROM   DUAL UNION ALL
    SELECT 1,
           ' -A MEASUREMENT OF DISTORTION PRODUCT OTOACOUSTIC EMISSIONS WHILE ATTENDINGTO FAMILIAR AND UNFAMILIAR VISUAL STIMULATION|',
           2,
           1
    FROM   DUAL;
    

    Outputs:

    SHRQPNM_PIDM PART1 PART2 PART3 PART4 SHRQPNM_ACTIVITY_D
    1 DOCTORAL THESIS -A MEASUREMENT OF DISTORTION PRODUCT OTOACOUSTIC EMISSIONS WHILE ATTENDINGTO FAMILIAR AND UNFAMILIAR VISUAL STIMULATION null

    If you want to include the trailing space in the output (and have a maximum of 60 characters) then:

    SELECT emplid as SHRQPNM_PIDM,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,59})( |$)',
             1,
             1
           ) AS part1,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,59})( |$)',
             1,
             2
           ) AS part2,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,59})( |$)',
             1,
             3
           ) AS part3,
           REGEXP_SUBSTR(
             listagg(SSR_transcript_txt) WITHIN GROUP (ORDER BY text_seq_nbr),
             '(.{1,59})( |$)',
             1,
             4
           ) AS part4,
           to_char(sysdate, 'YYYYMMDD') SHRQPNM_ACTIVITY_D
    FROM   PS_TSCRPT_TEXT
    GROUP BY emplid, print_loc_SEQ
    order by emplid, print_loc_SEQ;
    

    db<>fiddle here