Search code examples
postgresqlplpgsqlset-returning-functions

Split string with two delimiters and convert type


I have a PL/pgSQL function like this (thanks to the guy who made this possible):

 CREATE OR REPLACE FUNCTION public.split_string(text, text)
    RETURNS SETOF text
    LANGUAGE plpgsql
    AS $function$
    DECLARE 
      pos int; 
      delim_length int := length($2);
    BEGIN
      WHILE $1 <> ''
      LOOP
        pos := strpos($1,$2);
        IF pos > 0 THEN
          RETURN NEXT substring($1 FROM 1 FOR pos - 1);
          $1 := substring($1 FROM pos + delim_length);
        ELSE
          RETURN NEXT $1;
          EXIT;
        END IF; 
      END LOOP;
      RETURN;
    END;
    $function$

It splits a string with a delimiter. Like this:

select * from split_string('3.584731 60.739211,3.590472 60.738030,3.592740 60.736220', ' ');

"3.584731"
"60.739211,3.590472"
"60.738030,3.592740"
"60.736220"

How can I save the results in a temp_array or temp_table. So I can get the the results in temp_x and split up these points again. Like:

"3.584731"
"60.739211"
"3.590472"
"60.738030"
"3.592740"
"60.736220"

and return the values as double precision. And all of this should be done in the function.


Solution

  • If you need the intermediary step:

    SELECT string_to_table(a, ' ')::float8
           -- or do something else with the derived table
    FROM   string_to_table('3.584731 60.739211,3.590472 60.738030', ',') a;
    

    This is faster than regexp_split_to_table(). Regular expressions are powerful but expensive. Test with EXPLAIN ANALYZE.

    I first split at ',', and next at ' ' - the reversed sequence of what you describe seems adequate.

    If need be, you can wrap this into a PL/pgSQL function:

    CREATE OR REPLACE FUNCTION public.split_string(_str text
                                                 , _delim1 text = ','
                                                 , _delim2 text = ' ')
      RETURNS SETOF float8
      LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE AS
    $func$
    BEGIN
       RETURN QUERY
       SELECT string_to_table(a, _delim2)::float8
       FROM   string_to_table(_str, _delim1) a;
    END
    $func$;
    

    Or as plain SQL function:

    CREATE OR REPLACE FUNCTION public.split_string(_str text
                                                 , _delim1 text = ','
                                                 , _delim2 text = ' ')
      RETURNS SETOF float8
      LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
    $func$
    SELECT string_to_table(a, _delim2)::float8
    FROM   string_to_table(_str, _delim1) a;
    $func$;
    

    Declare it IMMUTABLE (which it is) to allow performance optimization and other uses.
    Declare it PARALLEL SAFE (which it is) to not disable parallel query plans.

    Call (using the provided defaults for _delim1 and _delim2):

    SELECT * FROM split_string('3.584731 60.739211,3.590472 60.738030');
    

    Or with custom delimiters:

    SELECT * FROM split_string('3.584731 60.739211,3.590472 60.738030', ',', ' ');
    

    Fastest

    For top performance, combine translate() with string_to_table(...):

    SELECT string_to_table(
              translate('3.584731 60.739211,3.590472 60.738030', ' ', ',')
            , ','
              ))::float8;
    

    In Postgres 13 or older use unnest(string_to_array()) instead of string_to_table().