Search code examples
regexsearchcharacterbracketsproximity

Using brackets for proximity search "Tolkien.{0,300}Martin" doesn't work with line breaks. Workaround?


Often I look for authors being commented/cited near each other. However, most of the times I am searching PDFs which have lots of line breaks.

For short words I use \s+ to handle the line breaks, but I don't know how to add this function when using proximity search based on brackets

\bTolkien.{0,300}Martin\b.

If they are in same line ok, but often 200 characters will lead to line breaks. So, is there a way to add the logic of the \s+ in order that any character inside .{0,300} could ignore line breaks?

Doing so, I could find

\bTolkien.{0,300}Martin\b|\bTolkien.{0,300}Martin\b

despite of having line breaks through the .{0,300} characters.

Many thanks, Cadu


Solution

  • I do not know what regexp language you are using, but in principle you are looking for something like this:

    (?:.|\n){0,300}
    

    or just

    (.|\n){0,300}
    

    That is an OR between . (which is any char except newline) and `\n' (newline).

    You may need backslashes: \(.\|\n\) depending on your language. (You may even need doubling the backslashes.)

    Oh, I see you are using | without backslash so I guess you do not need any backslash.

    (BTW, you forgot to change the order between the two authors in the alternate regexp.)