I am trying to extract the text from the below html snippet. Need help in regex pattern that will replace all the html tag and only will leave out the content.
I tried to remove the <span*>
using the below expression but that didn't do the trick.
String x = '<span style="font-size:11pt;"><span style="line-height:107%;"><span style="font-family:Calibri, sans-serif;"><strong><font color="#000000">Some normal text here...</font></strong></span></span></span>';
String y = x.replaceAll('[<span*\b>]','');
system.debug(y);
This prints out:
tyle="fot-ize:11t;" tyle="lie-height:107%;" tyle="fot-fmily:Clibri, -erif;"trogfot color="#000000"Some normal text here.../fot/trog///
So it basically replaced the each character individually and not the content between the <span ... >
Need Help
The second line of code should be:
String y = x.replaceAll('<span[^>]*>','');
The meaning of this statement is: for all the occurrences of '<span'
followed by many occurences (*
) of anything but '>'
([^>]
) followed by a single '>'
, replace by nothing.
By the way, you will miss the closing tab </span>
. I tell this just for your information, because you didn't ask for this.