Search code examples
rubyregexword-wrap

Wrap the long lines in Ruby


Wrap the long lines in the given text to the given length. Example: 'To be or not to be-that is the question', 5 => To be or not to be -that is the quest ion


Solution

  • This might help

     # http://stackoverflow.com/questions/20431801/word-wrapping-with-regular-expressions/20434776#20434776
    
     # MS-Windows  "Notepad.exe Word Wrap" simulation
     # ( N = 16 )
     # ------------------
     # Trims optional non-linebreak whitespace
     # external to the viewport
     # ============================
     # Find:     @"(?:(?:(?>(.{1,16})(?:(?<=[^\S\r\n])[^\S\r\n]?|(?<=[,.;:!/?])|(?=\r?\n|[-#%&*@_])|[^\S\r\n]))|(.{1,16}))(?:\r?\n)?|(?:\r?\n))"
     # Replace:  @"$1$2\r\n"
     # Flags:    Global     
    

    https://regex101.com/r/E7FxHg/1

     # Note - Through trial and error discovery, it apparears Notepad accepts an extra whitespace
     # (possibly in the N+1 position) to help alignment. This matters not because thier viewport hides it.
     # There is no trimming of any whitespace, so the wrapped buffer could be reconstituted by inserting/detecting a
     # wrap point code which is different than a linebreak.
     # This regex works on un-wrapped source, but could probably be adjusted to produce/work on wrapped buffer text.
     # To reconstitute the source all that is needed is to remove the wrap code which is probably just an extra "\r".
    
     (?:
          # -- Words/Characters 
          (?:
               (?>                     # Atomic Group - Match words with valid breaks
                    ( .{1,16} )             # (1), 1-N characters
                                            #  Followed by one of 4 prioritized, non-linebreak whitespace
                    (?:                     #  break types:
                         (?<= [^\S\r\n] )        # 1. - Behind a non-linebreak whitespace
                         [^\S\r\n]?              #      ( optionally accept an extra non-linebreak whitespace )
                      |  (?<= [,.;:!/?] )        # 2. - Behind sepcial punctuation breaks
                      |  (?=                     # 3. - Ahead a linebreak or special punctuation breaks
                              \r? \n 
                           |  [-#%&*@_] 
                         )
                      |  [^\S\r\n]               # 4. - Accept an extra non-linebreak whitespace
                    )
               )                       # End atomic group
            |  
               ( .{1,16} )             # (2), No valid word breaks, just break on the N'th character
          )
          (?: \r? \n )?           # Optional linebreak after Words/Characters
       |  
          # -- Or, Linebreak
          (?: \r? \n )            # Stand alone linebreak
     )