I want to parse the 2 digits in the middle from a date in dd/mm/yy
format but also allowing single digits for day and month.
This is what I came up with:
I want a 1 or 2 digit number [\d]{1,2}
with a 1 or 2 digit number and slash ^[\d]{1,2}\/
before it.
This doesn't work on many combinations, I have tested 10/10/10
, 11/12/13
, etc...
But to my surprise (?<=^\d\d\/)[\d]{1,2}
But the [\d]{1,2}
should also match if \d\d
did, or am I wrong?
Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.
In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2}
obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:
Or perhaps you can put both lookbehinds as alternates of a non-capturing group:
(note that you can just use \d
without the brackets).
That said, it's probably much simpler to use a capturing group instead:
Note that findall
returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).
This snippet illustrates all of the above points:
p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'^\d{1,2}\/(\d{1,2})')
print(p.findall("12/34/56")) # "[34]"
print(p.findall("1/23/45")) # "[23]"
p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")
Java supports only finite-length lookbehind, so you can use \d{1,2}
like in the original pattern. This is demonstrated by the following snippet:
String text =
"12/34/56 date\n" +
"1/23/45 another date\n";
Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
Matcher m = p.matcher(text);
while (m.find()) {
} // "34", "23"
Note that (?m)
is the embedded Pattern.MULTILINE
so that ^
matches the start of every line. Note also that since \
is an escape character for string literals, you must write "\\"
to get one backslash in Java.
C# supports full regex on lookbehind. The following snippet shows how you can use +
repetition on a lookbehind:
var text = @"
Regex r = new Regex(@"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
} // "23", "34", "45", "56"
Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape \
For completeness, here's how you'd use the capturing group option in C#:
Regex r = new Regex(@"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
Given the previous text
, this prints:
Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56