I am new to programming and Python, so I apologize if this is an obvious question. I tried looking at similar questions on this website, but the solutions seem to be outside of my reach.
Problem: Consider the following text:
12/19 Paul 1/20
1/20 Jacob 10/2
Using the module re, extract the names from the above. In other words, your output should be:
['Paul', 'Jacob']
First, I tried using positive look arounds. I tried:
import re
name_regex=re.compile(r'''(
(?<=\d{1,2}/\d{1,2}\s) #looks for one or two digits followed by a forward slash followed by one or two digits, followed by a space
.*? #looks for anything besides the newline in a non-greedy manner (is the non-greedy part necessary? I am not sure...)
(?=\s\d{1,2}/\d{1,2}) #looks for a space followed by one or two digits followed by a forward slash followed by one or two digits
)''', re.VERBOSE)
text=str("12/19 Paul 1/20\n1/20 Jacob 10/2")
print(name_regex.findall(text))
However, the above yields the error:
re.error: look-behind requires fixed-width pattern
From reading similar questions, I believe that this means that look arounds cannot have variable length (i.e., they cannot look for "1 or 2 digits").
However, how can I fix this?
Any help would be greatly appreciated. Especially the help suited for nearly a complete beginner like me!
PS. Ultimately, the list of names surrounded by dates can be very long. The dates can have one or two digits that are separated by a slash. I just wanted to give a minimal working example.
Thank you!
If you want to match at least a single non whitespace char between the digit patterns, you might use
(?<=\d{1,2}/\d{1,2}\s)\S.*?(?=\s\d{1,2}/\d{1,2})
This part \S.*?
will match a non whitespace char followed by any char except a newline non greedy so it will match until asserting the first occurrence of (?=\s\d{1,2}/\d{1,2})
Note that if you would use .*?
then match would also return an empty entry ['Paul', '', 'Jacob']
, see this example.
You could also use a capturing group instead of lookarounds:
\d{1,2}/\d{1,2}\s(\S.*?)\s\d{1,2}/\d{1,2}