I have an extractor to extract string which sometimes spread over 2 lines.
Regex : (?s)<h1 itemprop="name">(.+[\w\n\t])</h1>
Examples:
1) on 2 lines →
<h1 itemprop="name">Hello-, World1234
</h1>
Result :
Hello-, World1234
Blank Line -- I want to remove/trim this line
2) on 1 line →
<h1 itemprop="name">Hello-, World1234</h1>
Result :
Hello-, World1234 -- This result is correct
You can use the following regex:
<h1 itemprop="name">\s*(([^<>\s\h]+\s*[^<>\h\s]+\h*)+)\s*</h1>
with a back reference to your first capturing group: \1
I have tested it on the following examples and it works file:
<h1 itemprop="name">
Hello-,
World1234
</h1>
<h1 itemprop="name">Hello-, World1234
</h1>
<h1 itemprop="name">
Hello-,
World1234
</h1>
it provides the following output:
1)
Hello-,
World1234
2)
Hello-, World1234
3)
Hello-,
World1234