Search code examples
regexstringperllatitude-longitudegeodesic-sphere

Splitting a string containing a longitude or latitude expression in perl


I retrieve data from the net containing real geodesic expressions, by that I mean degrees, minutes and seconds with Unicode symbols: U+00B0, U+2032 and U+2033, named Degree, Prime and Double Prime. Example:

my $Lat = "48° 25′ 43″ N";

My objective is to convert such an expression first to degrees and then to radians to be used in a Perl module I am writing that implements the Vincenty inverse formula to calculate ellipsoidal great-circle distances. All my code objectives have been met with pseudo geodesics, such as "48:25:43 N", but of course, this is hand entered test data, not real world data. I am struggling with crafting a regular expression that can split this real data as I now do pseudo data, as in:

my ($deg, $min, $sec, $dir) = split(/[\s:]+/, $_[0], 4); # this works

I have tried many regular expressions including

/[°′″\s]+/ and
/[\x{0B00}\x{2032}\x{2033}\s]/+

all with dismal results, such as $deg = "48?", $min = "?", $sec = "25′43″ N" and $dir = undef. I've encapsulated the code inside braces {} and included within that scope use utf8; and use feature 'unicode_strings'; all with nada results.

input data example:

my $Lat = "48° 25′ 43″ N"; 

Expected output:

$deg = 48, $min = 25, $sec = 43 and $dir = "N"

Solution

  • You may try this regex to split the string:

    [^\dNSEW.]+
    

    Regex Demo

    Sample source: ( run here )

    my $str = '48° 25′ 43″ N';
    my $regex = qr/[^\dNSEW.]+/p;
    my ($deg, $min, $sec, $dir) = split $regex, $str;