Search code examples
regextextnotepad++geany

Regular Expression to Remove Line Breaks from Specific Lines


I have a text file with each paragraph on its own line. Some of the paragraphs got split at the start of a word. For example:

Books are an effective way to 
communicate across time, both from the past and into the future.

I could use regular expressions (regex), in the search and replace them in Notepad++ or Geany, to search for a lower case letter, at the start of a line and replace the \r\n (carriage return+line feed) with a space.
The problem is chapters have a subtitle that comes after the word "or" and the word "or" is on a line by itself. For example:

Chapter 3 
The Importance of Reading 
or
Literature is the most agreeable way of ignoring life

Using that method would put the "or" lines in the titles of the chapters instead of on their own line.

What I want is to tell regex if a line starts with a lowercase letter to match it (replacing the proceeding \r\n with a space) but not if the line is "or\r\n".


Solution

  • It looks like you could use lookarounds—search for:

    \h*\R(?=[a-z])(?!or$)
    

    And replace with space. See this demo at regex101 (explanation on the right side).

    • \h matches horizontal space
    • \R matches any newline sequence
    • $ matches end of line (Notepad++'s default)

    In Notepad++'s replace dialog, make sure to check [•] Match case and [•] Wrap around.