I have been fighting with this for a while, so hopefully someone can help me out. I'm open to any and all suggestions.
When I query QGeoAddress::street()
, I (may) receive both the street number, plus the street name. I would like to get just the street name.
Example:
King St W -> King St W
99 King St W -> King St W
99a King St W -> King St W ...
1st St -> 1st St
99 1st St -> 1st St
99a 1st St -> 1st St ...
315 W. 42nd -> W. 42nd
42 St. Paul Drive -> St. Paul Drive
I need to do this so that the location of two separate devices can be compared via the most recent street name. If a device is at "99 King St W", it is on the same street as "113 King St W", or "113a King St W".
As it stands, I don't believe regex is a good, reliable solution as there are too many rules to impose and the variability of street names is working against me. Theoretically, there may be a street called "1 St", which would fail the regex normalizing "1 1st St".
Writing my own fuzzy matcher may provide better results, but may fail for shorter street names.
I have also considered querying a REST web service, however many of the free services have limitations on requests per day, or a minimum time between requests that would render that method too expensive.
Like I say, I'd love to hear what you guys can come up with.
Much appreciated :)
As I said in the comments, the problem here is that the wrong
question is being asked. But if you have to, and you can
exlude PO boxes (the string ends in a number?), and you limit
yourself to addresses in the USA (because you wouldn't believe
some of the things you see in the UK), then you might start by
detecting a leading number, then appending everything that isn't
separated from it by a space. It's hardly perfect, because
there'll always be people who write "99 A King St."
, rather
than "99a King St."
. (But then, in the first, is the name of
the street "King St."
or "A King St."
? Unless you know the
street yourself, you can't be sure.) The regular expression for
this would be "\\d+\\w*"
. Beyond that, you can try certain
heuristics with the results: if they are a single word, exactly
matching "St"
, "Street"
, "Ave"
, etc. (there are probably
about 20 different words you should check, with or without
trailing "."
in the case of abbreviations), then you probably
have just the street.
But before even starting, I would insist that you query the
assignment. It's well known, for example, that when inputting
addresses, about all you can do is "First line:"
, "Second
line:"
, etc. Even asking for a post code can be tricky.