I am currently using this method to sort letters in a string alphabetically in PostgreSQL. Are there any other efficient methods?
select string_agg(c, '') as s
from (select unnest(regexp_split_to_array('ijsAafhareDbv', '')) as c
order by c) as t;
s
--------------
ADaabefhijrsv
A function implemented in C
is substantially faster than anything we can achieve with LANGUAGE sql
or plpgsql
. So your plpythonu
function wins the performance contest by a long shot.
But plpythonu
is an untrusted procedural language. It's not installed by default and only superusers can create functions in untrusted languages. You need to be aware of security implications. And untrusted languages are not available at all on most cloud services.
The current manual (quote from pg 10):
PL/Python is only available as an “untrusted” language, meaning it does not offer any way of restricting what users can do in it and is therefore named
plpythonu
. A trusted variantplpython
might become available in the future if a secure execution mechanism is developed in Python. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such asplpythonu
.
The SQL functions you tested are not well optimized. There are a thousand and one ways to improve performance, yet:
-- func to create random strings CREATE OR REPLACE FUNCTION f_random_string(int) RETURNS text AS $func$ SELECT array_to_string(ARRAY( SELECT substr('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz', (ceil(random()*62))::int, 1) FROM generate_series(1, $1) ), '') $func$ LANGUAGE sql VOLATILE; -- test tbl with 100K rows CREATE TABLE tbl(str text); INSERT INTO tbl SELECT f_random_string(15) FROM generate_series(1, 100000) g;
VACUUM ANALYZE tbl;
-- 1: your test function 1 (inefficient) CREATE OR REPLACE FUNCTION sort1(text) RETURNS text AS $func$ -- your test function 1 (very inefficient) SELECT string_agg(c, '') FROM (SELECT unnest(regexp_split_to_array($1, '')) AS c ORDER BY c) t; $func$ LANGUAGE sql IMMUTABLE; -- 2: your test function 2 ( inefficient) CREATE OR REPLACE FUNCTION sort2(text) RETURNS text AS $func$ WITH t(s) AS (VALUES ($1)) SELECT string_agg(substr(t.s, g.g, 1), '' ORDER BY substr(t.s, g.g, 1)) FROM t CROSS JOIN LATERAL generate_series(1, length(t.s)) g; $func$ LANGUAGE sql IMMUTABLE; -- 3: remove pointless CTE from sort2 CREATE OR REPLACE FUNCTION sort3(text) RETURNS text AS $func$ SELECT string_agg(substr($1, g, 1), '' ORDER BY substr($1, g, 1)) FROM generate_series(1, length($1)) g; $func$ LANGUAGE sql IMMUTABLE; -- 4: use unnest instead of calling substr N times CREATE OR REPLACE FUNCTION sort4(text) RETURNS text AS $func$ SELECT string_agg(c, '' ORDER BY c) FROM unnest(string_to_array($1, NULL)) c $func$ LANGUAGE sql IMMUTABLE; -- 5: ORDER BY in subquery CREATE OR REPLACE FUNCTION sort5(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM ( SELECT c FROM unnest(string_to_array($1, NULL)) c ORDER BY c ) sub $func$ LANGUAGE sql IMMUTABLE; -- 6: SRF in SELECT list CREATE OR REPLACE FUNCTION sort6(text) RETURNS text AS $func$ SELECT string_agg(c, '') FROM (SELECT unnest(string_to_array($1, NULL)) c ORDER BY 1) sub $func$ LANGUAGE sql IMMUTABLE; -- 7: ARRAY constructor instead of aggregate func CREATE OR REPLACE FUNCTION sort7(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1, NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE; -- 8: The same with COLLATE "C" CREATE OR REPLACE FUNCTION sort8(text) RETURNS text AS $func$ SELECT array_to_string(ARRAY(SELECT unnest(string_to_array($1 COLLATE "C", NULL)) c ORDER BY c), '') $func$ LANGUAGE sql IMMUTABLE;
SELECT str, sort1(str), sort2(str), sort3(str), sort4(str), sort5(str), sort6(str), sort7(str), sort8(str) FROM tbl LIMIT 1; -- result sample
str | sort1 | sort2 | sort3 | sort4 | sort5 | sort6 | sort7 | sort8 :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- tUkmori4D1rHhI1 | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DhHiIkmorrtU | 114DHIUhikmorrt
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort1(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.053 ms | | Execution time: 2742.904 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort2(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.105 ms | | Execution time: 2579.397 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort3(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.079 ms | | Execution time: 2191.228 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort4(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 2194.780 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort5(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.083 ms | | Execution time: 1902.829 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort6(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.075 ms | | Execution time: 1866.407 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort7(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.067 ms | | Execution time: 1863.713 ms |
EXPLAIN (ANALYZE, TIMING OFF) SELECT sort8(str) FROM tbl;
| QUERY PLAN | | :--------------------------------------------------------------------------------------- | | Seq Scan on tbl (cost=0.00..26541.00 rows=100000 width=32) (actual rows=100000 loops=1) | | Planning time: 0.074 ms | | Execution time: 1569.376 ms |
db<>fiddle here
The last one sorts without COLLATION
rules, that's strictly by byte values of characters, which is substantially cheaper. But you may or may not need the sort order for a different locale.