Search code examples
sqloracle-databasesplitstring-parsing

split column value of a table and skip some words


Hil All,

I have a table , count is about 200M. It has a column which contains data separated by '~'. I want to parse it.

e.g:

Column1
A~B~C~D~E~F

Result :
Column_new1
A~C~E

I just want to skip 2,4,6,n th. words. I don't want plsql. I need sql query. And table is very big,I also need performance.

I use substr,instr functions and I can parse. But it runs really slowly..

Thanks for help.


Solution

  • If you are after performance then use the INSTR and SUBSTR simple string functions:

    SELECT SUBSTR(column1, 1,      p1 - 1 ) || '~' ||
           SUBSTR(column1, p2 + 1, p3 - p2 - 1) || '~' ||
           SUBSTR(column1, p4 + 1, p5 - p4 - 1) AS column1_new
    FROM   (
      SELECT column1,
             INSTR(column1, '~', 1, 1) AS p1,
             INSTR(column1, '~', 1, 2) AS p2,
             INSTR(column1, '~', 1, 3) AS p3,
             INSTR(column1, '~', 1, 4) AS p4,
             INSTR(column1, '~', 1, 5) AS p5
      FROM   table_name
    );
    

    Which, for the sample data:

    CREATE TABLE table_name (column1) AS
    SELECT 'A~B~C~D~E~F' FROM DUAL;
    

    Outputs:

    COLUMN1_NEW
    A~C~E

    If you want a shorter query then you can use regular expressions:

    SELECT REGEXP_REPLACE(column1, '([^~]+)~[^~]+~([^~]+)~[^~]+~([^~]+).*', '\1~\2~\3' )
             AS column1_new
    FROM   table_name;
    

    However, you will find that performance is likely to be an order of magnitude worse than simple string functions.


    Another alternative would be to generate a materialized view.

    db<>fiddle here