Can someone explain how to use re.find all to separate only dates from the following strings? When the date can be either of the format- 1.1.2001 or 11.11.2001. There is volatile number of digits in the string representing days and months-
import re
str = "This is my date: 1.1.2001 fooo bla bla bla"
str2 = "This is my date: 11.11.2001 bla bla foo bla"
I know i should use re.findall(pattern, string) but to be honest I am completely confused about those patterns. I don't know how to assemble the pattern to fit in my case.
I have found something like this but I absolutely don't know why there is the r letter before the pattern ... \ means start of string? d means digit? and number in {} means how many?
match = re.search(r'\d{2}.\d{2}.\d{4}', text)
Thanks a lot!
The r
prefix to the strings tells the Python Interpreter it is a raw string, which essentially means backslashes \
are no longer treated as escape characters and are literal backslashes. For re
module it's useful because backslashes are used a lot, so to avoid a lot of \\
(escaping the backslash) most would use a raw string instead.
What you're looking for is this:
match = re.search(r'\d{1,2}\.\d{1,2}\.\d{4}', text)
The {}
tells regex how many occurrences of the preceding set you wanted. {1,2}
means a minimum of 1 and a maxmium of 2 \d
, and {4}
means an exact match of 4 occurrences.
Note that the .
is also escaped by \.
, since in regex .
means any character, but in this case you are looking for the literal .
so you escape it to tell regex to look for the literal character.
See this for more explanation: https://regex101.com/r/v2QScR/1