Search code examples
pythonregexpython-re

How to return multiple regex values as a tuple


I am working on a Python program that searches through received emails and returns coordinates. I am trying to create a regular expression to select the Lat/long values from a string. (I am new to regex)

Here is a small example of one of the strings I have been using for testing:

     content = """

WorkLocationBoundingBox
Latitude:30.556555Longitude:-97.659824
SecondLatitude:30.569138SecondLongitude:-97.650855

     """

I came up with Latitude:(\d+).(\d+)Longitude:(.*), which I believe is close to what I need, but it sperates 30 and 556555 into seperate groups. But, -97.659824 is correctly placed into a group.

My ideal expected result would look something this:

[(30.556555, -97.659824, 30.569138, -97.650855)]

Solution

  • You can use 3 capture groups, where the first group is used to match up the word before Long or Latitude.

    ((?:Second)?)Latitude:(-?\d+(?:\.\d+)?)\1Longitude:(-?\d+(?:\.\d+)?)
    
    • ((?:Second)?) Capture group 1, optionally match Second
    • Latitude: Match literally
    • (-?\d+(?:\.\d+)?) Capture group 2, match an optional - then 1+ digits with an optional decimal part
    • \1Longitude: A Backreference to what is matched in group 1 and match Longitude:
    • (-?\d+(?:\.\d+)?) Capture group 3, match an optional - then 1+ digits with an optional decimal part

    Regex demo or a Python demo

    import re
    regex = r"((?:Second)?)Latitude:(-?\d+(?:\.\d+)?)\1Longitude:(-?\d+(?:\.\d+)?)"
    s = ("WorkLocationBoundingBox\n"
                "Latitude:30.556555Longitude:-97.659824\n"
                "SecondLatitude:30.569138SecondLongitude:-97.650855")
    
    matches = re.finditer(regex, s)
    lst = []
    
    for matchNum, match in enumerate(matches, start=1):
         lst.append(match.group(2))
         lst.append(match.group(3))
    
    print(lst)
    

    Output

    ['30.556555', '-97.659824', '30.569138', '-97.650855']
    

    A bit less strict pattern could be matching optional word character before either Longitude or Latitude:

    \w*Latitude:(-?\d+(?:\.\d+)?)\w*Longitude:(-?\d+(?:\.\d+)?)
    

    Regex demo

    In that case, you might also use re.findall to return the group values in a list of tuples if you want:

    import re
    
    pattern = r"\w*Latitude:(-?\d+(?:\.\d+)?)\w*Longitude:(-?\d+(?:\.\d+)?)"
    
    s = ("WorkLocationBoundingBox\n"
                "Latitude:30.556555Longitude:-97.659824\n"
                "SecondLatitude:30.569138SecondLongitude:-97.650855")
    print(re.findall(pattern, s))
    

    Output

    [('30.556555', '-97.659824'), ('30.569138', '-97.650855')]