Search code examples
javascriptregex

Split address and numbers


I'm trying to split street name, house number, and box number from a String.
Let's say the string is "SomeStreet 59A"
For this case I already have a solution with regex. I'm using this function:

address.split(/([0-9]+)/) //output ["SomeStreet","59","A"]

The problem I'm having now, is that some addresses have weird formats. Meaning, the above method does not fit for strings like:

"Somestreet 59-65" // output ["SomeStreet", "59", "-", "65"] Not good

My question for this case is, how to group the numbers to get this desired output:

["Somestreet", "59-65"]

Another weird example is:

"6' SomeStreet 59" // here "6' Somestreet" is the exact street-name.

Expected output: ["6' Somestreet", "59"]

"6' Somestreet 324/326 A/1" // Example with box number   

Expected output: ["6' Somestreet", "324/326", "A/1"]

Bear in mind that this has to be in one executable function to loop through all of the addresses that I have.


Solution

  • To support all string formats listed in the question, you can use

    .match(/^(.*?)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/)
    .match(/^(.*)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/)
    

    See the regex demo.

    Details:

    • ^ - start of string
    • (.*?) - Group 1: any zero or more chars other than line break chars, as few as possible (if you need to match the last number as Group 2, the Number, you need to use .*, a greedy variant)
    • \s+ - one or more whitespaces
    • (\d+(?:[-.\/]\d+)?) - Group 2: one or more digits optionally followed with -/.// and then one or more digits
    • (?:\s*(\S.*))? - an optional occurrence of zero or more whitespaces and - Group 3 - a non-whitespace char and the rest of the string
    • $ - end of string.

    See a JavaScript demo:

    const texts = ['SomeStreet 59A','Somestreet 59-65',"6' SomeStreet 59", 'Somestreet 1.1', 'Somestreet 65 A/1', "6' Somestreet 324/326 A/1"];
    const rx = /^(.*?)\s+(\d+(?:[-.\/]\d+)?)(?:\s*(\S.*))?$/;
    for (const text of texts) {
      const [_, street, number, box] = text.match(rx);
      console.log(text, '=>', {"Street":street, "Number":number, "Box":box});
    }