I'm attempting to get the text string from a string of HTML. I would like to capture only the text between tags and skip over any empty tags.
My attempt is current attempt can be found here:
https://regex101.com/r/3Ujmw6/2
I have tried:
/>(\X+?)</g
//I will fail on nested tags, it capture the first nested tag
<p><strong>blablab</strong></p>
And this:
/>(\X*?)</g
//Finds me all the string, but also includes loads of empty strings
//for adjacent tags ><
Is there any way to exclude < from \X? Or is there a better way to write this so it returns only the text parts?
Try a regex like
>(\s*[^\s<][^<]*)
This simply matches all text between >
and <
that isn't all whitespace. See https://regex101.com/r/3Ujmw6/4.