Search code examples
rubyregexgsubpostal-code

format string (postcode) in ruby


I need to re-format a list of UK postcodes and have started with the following to strip whitespace and capitalize:

postcode.upcase.gsub(/\s/,'')

I now need to change the postcode so the new postcode will be in a format that will match the following regexp:

^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$

I would be grateful of any assistance.


Solution

  • If this standards doc is to be believed (and Wikipedia concurs), formatting a valid post code for output is straightforward: the last three characters are the second part, everything before is the first part!

    So assuming you have a valid postcode, without any pre-embedded space, you just need

    def format_post_code(pc)
      pc.strip.sub(/([A-Z0-9]+)([A-Z0-9]{3})/, '\1 \2')
    end
    

    If you want to validate an input post code first, then the regex you gave looks like a good starting point. Perhaps something like this?

    NORMAL_POSTCODE_RE = /^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[A-HJKS-UW0-9]?)\s*([0-9][ABD-HJLN-UW-Z]{2})$/i
    GIROBANK_POSTCODE_RE = /^GIR\s*0AA$/i
    def format_post_code(pc)
      return pc.strip.upcase.sub(NORMAL_POSTCODE_RE, '\1 \2') if pc =~ NORMAL_POSTCODE_RE
      return 'GIR 0AA' if pc =~ GIROBANK_POSTCODE_RE
    end
    

    Note that I removed the '0-9' part of the first character, which appears unnecessary according to the sources I quoted. I also changed the alpha sets to match the first-cited document. It's still not perfect: a code of the format 'AAA ANN' validates, for example, and I think a more complex RE is probably required.

    I think this might cover it (constructed in stages for easier fixing!)

    A1  = "[A-PR-UWYZ]"
    A2  = "[A-HK-Y]"
    A34 = "[A-HJKS-UW]"        # assume rule for alpha in fourth char is same as for third
    A5  = "[ABD-HJLN-UW-Z]"
    N   = "[0-9]"
    AANN = A1 + A2 + N + N     # the six possible first-part combos
    AANA = A1 + A2 + N + A34
    ANA  = A1 + N + A34
    ANN  = A1 + N + N
    AAN  = A1 + A2 + N
    AN   = A1 + N
    PART_ONE = [AANN, AANA, ANA, ANN, AAN, AN].join('|') 
    PART_TWO = N + A5 + A5
    
    NORMAL_POSTCODE_RE = Regexp.new("^(#{PART_ONE})[ ]*(#{PART_TWO})$", Regexp::IGNORECASE)