I need to sort a list of UK postcodes in to order.
Is there a simple way to do it?
UK postcodes are made up of letters and numbers:
see for full info of the format: http://en.wikipedia.org/wiki/UK_postcodes
But my problem is this a simple alpha sort doesn't work because each code starts with 1 or two letters letters and then is immediately followed by a number , up to two digits, then a space another number then a letter. e.g. LS1 1AA or ls28 1AA, there is also another case where once the numbers in the first section exceed 99 then it continues 9A etc.
Alpha sort cause the 10s to immediately follow the 1:
...
LS1 9ZZ
LS10 1AA
...
LS2
I'm looking at creating a SQL function to convert the printable Postcode into a sortable postcode e.g. 'LS1 9ZZ' would become 'LS01 9ZZ', then use this function in the order by clause.
Has anybody done this or anything similar already?
You need to think of this as a tokenization issue so SW1A 1AA should tokenize to:
(although you could break the inward part down into 1 and AA if you wanted to)
and G12 8QT should tokenize to:
Once you have broken the postcode down into those component parts then sorting should be easy enough. There is an exception with the GIR 0AA postcode but you can just hardcode a test for that one
edit: some more thoughts on tokenization
For the sample postcode SW1A 1AA, SW is the postcode area, 1A is the postcode district (which we'll break into two parts for sorting purposes), 1 is the postcode sector and AA is the unit postcode.
These are the valid postcode formats (source: Royal Mail PAF user guide page 8 - link at bottom of this page):
AN NAA
AAN NAA
ANN NAA
ANA NAA
AAA NAA (only for GIR 0AA code)
AANN NAA
AANA NAA
So a rough algorithm would be (assuming we want to separate the sector and unit postcode):
If the purpose is to display a list of postcodes for the user to choose from then I would adopt Neil Butterworth's suggestion of storing a 'sortable' version of the postcode in the database. The easiest way to create a sortable version is to pad all entries to nine characters:
and GIR 0AA is a slight exception again. If you pad with spaces then the sort order should be correct. Examples using # to represent a space:
You need to right-pad the area if it's too short: left-padding produces the wrong sort order. All of the single letter areas - B, E, G, L, M, N, S, W - would sort before all of the two-letter areas - AB, AL, ..., ZE - if you left-padded
The district number needs to be left padded to ensure that the natural W1, W2, ..., W9, W10 order remains intact