Search code examples
.netregexgreedyregex-greedy

Ignoring an optional suffix with a greedy regex


I'm performing regex matching in .NET against strings that look like this:

1;#Lists/General Discussion/Waffles Win
2;#Lists/General Discussion/Waffles Win/2_.000
3;#Lists/General Discussion/Waffles Win/3_.000

I need to match the URL portion without the numbers at the end, so that I get this:

Lists/General Discussion/Waffles Win

This is the regex I'm trying:

(?:\d+;#)(?<url>.+)(?:/\d+_.\d+)*

The problem is that the last group is being included as part of the middle group's match. I've also tried without the * at the end but then only the first string above matches and not the rest.

I have the multi-line option enabled. Any ideas?


Solution

  • A few different alternatives:

    @"^\d+;#([^/]+(?:/[^/]+)*?)(?:/\d+_\.\d+)?$"
    

    This matches as few path segments as possible, followed by an optional last part, and the end of the line.

    @"^\d+;#([^/]+(?:/(?!\d+_\.\d+$)[^/]+)*)"
    

    This matches as many path segments as possible, as long as it is not the digit-part at the end of the line.

    @"^\d+;#(.*?)(?:/\d+_\.\d+)?$"
    

    This matches as few characters as possible, followed by an optional last part, and the end of the line.