Our application has a CustomerNumber
field. We have hundreds of different people using the system (each has their own login and their own list of CustomerNumber
s). An individual user might have at most 100,000 customers. Many have less than 100.
Some people only put actual numbers into their customer number fields, while others use a mixture of things. The system allows 20 characters which can be A-Z, 0-9 or a dash, and stores these in a VARCHAR2(20). Anything lowercase is made uppercase before being stored.
Now, let's say we have a simple report that lists all the customers for a particular user, sorted by Customer Number. e.g.
SELECT CustomerNumber,CustomerName
FROM Customer
WHERE User = ?
ORDER BY CustomerNumber;
This is a naive solution as the people that only ever use numbers do not want to see a plain alphabetic sort (where "10" comes before "9").
I do not wish to ask the user any unnecessary questions about their data.
I'm using Oracle, but I think it would be interesting to see some solutions for other databases. Please include which database your answer works on.
What do you think the best way to implement this is?
In Oracle 10g:
SELECT cust_name
FROM t_customer c
ORDER BY
REGEXP_REPLACE(cust_name, '[0-9]', ''), TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+'))
This will sort by the first occurence of number, not regarding it's position, i. e.:
customer1 < customer2 < customer10
cust1omer ? customer1
cust8omer1 ? cust8omer2
, where a ?
means that the order is undefined.
That suffices for most cases.
To force sort order on case 2
, you may add a REGEXP_INSTR(cust_name, '[0-9]', n)
to ORDER BY
list n
times, forcing order on the first appearance of n
-th (2nd
, 3rd
etc.) group of digits.
To force sort order on case 3
, you may add a TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+', n))
to ORDER BY
list n
times, forcing order of n
-th. group of digits.
In practice, the query I wrote is enough.
You may create a function based index on these expressions, but you'll need to force it with a hint, and a one-pass SORT ORDER BY
will be performed anyway, as the CBO
doesn't trust function-base indexes enough to allow an ORDER BY
on them.