I am trying to sort a column title from the table door with the following PostgreSQL query:
SELECT title
FROM door
ORDER BY
CASE
WHEN title ~ '^\d+' THEN CAST(regexp_replace(title, '[^0-9]', '', 'g') AS INTEGER)
ELSE NULL
END ASC,
CASE
WHEN title ~ '^\d+' THEN regexp_replace(title, '\d+', '', 'g')
ELSE title
END ASC;
The query currently produces the following result:
"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-1001"
"DR-101"
"DR-102"
"DR-104"
"Entrance-1"
"Entrance-2"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-20-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"
However, I was expecting the result to be:
"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-101"
"DR-102"
"DR-104"
"DR-1001"
"Entrance-1"
"Entrance-2"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-20-Entrance-1"
The query is not correctly ordering the rows where the title column contains mixed prefixes and numeric segments, particularly the entries with the MY-Prefix pattern. I suspect this is due to the logic in the ORDER BY clause not properly handling the sorting of multi-segment alphanumeric strings.
I want to:
Sort rows where the numeric segment is extracted from title
and ordered numerically.
Handle alphanumeric prefixes correctly so that entries like MY-Prefix-1-Entrance-1
appear before MY-Prefix-10-Entrance-1
.
That's one of the purposes of custom ICU collations. You can define a sorting method that treats numbers numerically, the way you wanted: demo at db<>fiddle
CREATE COLLATION numerical( provider=icu
,deterministic=false
,locale='und-u-kn' );
Then you just tell your order by
to use it:
SELECT title
FROM door
ORDER BY title COLLATE numerical;
The demo shows it's also faster than trying to handle that using regex and conditionals.