Search code examples
regexregex-lookaroundssubdirectory

Can I improve simplicity using negative lookahead to find the last folder in a file path?


I’m trying to find a simpler solution to locating the last folder path in a file list that does not contain a file of type, but must use lookarounds. Can anyone explain some improvements in my regex code that follows?

Search text:

c:\this\folder\goes\findme.txt
c:\this\folder\cant\findme.doc
c:\this\folder\surecanfind.txt
c:\\anothertest.rtf
c:\t.txt

RegEx:

(?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)

Expected result:

‘goes’
‘folder’

Can the RegEx lookahead be improved and simplified? Thanks for the help.


Solution

  • In your original regex:
    (?<=\\)[^\\\n\r]+?(?=\\[^\\]*\.)(?!.*\.doc)

    there isn't really much to improve in terms of the use of lookarounds.

    The positive look behind is necessary to tell the regex when it is allowed to begin a match.
    The positve look ahead is necessary to terminate the expansion of the +? quantifier.
    And the negative look ahead is needed to negate invalid matches.

    You might be able to condense both look aheads into one. But keeping them separate is more efficient, since if the evaluation of one fails, it can skip the evaluation of the second.


    However, if your looking for a more efficient/"normal" Regex, I would typically use something like:
    ^.*\\(.+?)\\[^\\]+\.(?!doc).+$

    instead of using lookarounds to exclude everything except my desired output from a match, I'd include my desired output in a capture group.
    this allows me to tell regex to only check for a match once per line, instead of after ever \ character.

    Then, to get my desired output, all I have to do is grab the content of capture group 1 from each match.

    working example

    orignal (98,150 steps)
    Capture Groups (66,586 steps)

    Hopefully that'll help you out