I need to build a regular expression to capture one or more windows paths inside a text. It's for a syntax highlighter.
Imagine this text:
Hey, Bob!
I left you the report for tomorrow in D:\Files\Shares\report.pdf along
with the other reports.
There's also this pptx here D:\Files\Internal\source.pptx where you have
the original if you need to change anything.
Cheers!
Alice.
This one is easy to capture with /[a-zA-Z]:\\[^\s]*/mg
. See it in regex101 here https://regex101.com/r/VcBV7M/1
when the path has spaces like here:
I left you the report for tomorrow in D:\Shared files\october report.pdf along
with the other reports.
then we run into problems: What is the path? D:\Shared
or D:\Shared files\october
or D:\Shared files\october report.pdf
or D:\Shared files\october report.pdf along
...
For a human it's simple to infer. For a computer it's impossible so I was thinking into forcing the users to use quotes or brackets to indicate the begin and end of the filename or path.
How can I write a regex that given this:
Hey, Bob!
I left you the report for tomorrow in "D:\Shared files\october report.pdf" along
with the other reports [Don't forget to add your punctuation]. See
also D:\Multifiles\charlie.docx for more info.
There's also this pptx here [D:\Internal files\source for report.pptx] where you have
the original if you need to change "anything like the boss wants".
Cheers!
Alice.
captures this?
D:\Shared files\october report.pdf
D:\Multifiles\charlie.docx
D:\Internal files\source for report.pptx
but not
Don't forget to add your punctuation
anything like the boss wants
Non-working sample: https://regex101.com/r/RGVPz6/2
I found that lookahead and lookbehind helped.
The expression is: (?<=\[)[a-zA-Z]:\\.*(?=])|(?<=")[a-zA-Z]:\\.*(?=")|([a-zA-Z]:\\[^\s]*)
Here's the solution online-live-edit https://regex101.com/r/RGVPz6/3
The key parts are:
(?<=\[)
and (?=])
tell to only match if the string is exactly surrounded by [
and ]
.(?<=")
and (?=")
tell to only match if the string is exactly surrounded by "
and "
.See it working:
PD: Inspired here https://es.javascript.info/regexp-lookahead-lookbehind