Search code examples
regexnewlinespace

Regex to trim trailing space and new line


I have an extractor to extract string which sometimes spread over 2 lines.

Regex : (?s)<h1 itemprop="name">(.+[\w\n\t])</h1>

Examples:

1) on 2 lines →

<h1 itemprop="name">Hello-, World1234
</h1>

Result :

Hello-, World1234
Blank Line   -- I want to remove/trim this line

2) on 1 line →

<h1 itemprop="name">Hello-, World1234</h1>

Result :

Hello-, World1234   -- This result is correct

Solution

  • You can use the following regex:

    <h1 itemprop="name">\s*(([^<>\s\h]+\s*[^<>\h\s]+\h*)+)\s*</h1>
    

    with a back reference to your first capturing group: \1

    I have tested it on the following examples and it works file:

    <h1 itemprop="name">
    Hello-, 
    World1234
    
    </h1>
    
    <h1 itemprop="name">Hello-, World1234
    
    </h1>
    
    <h1 itemprop="name">
    Hello-, 
    
    
    World1234
    
    </h1>
    

    it provides the following output:

    1)

    Hello-, 
    World1234
    

    2)

    Hello-, World1234
    

    3)

    Hello-, 
    
    
    World1234