I'm trying to extract a specific word (that might change) which comes after a permanent expression. I want to extract the name Taldor
in this code:
<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Taldor</span>
</h4>
For now I able to find <h4 class="t-16 t-black t-normal">
using this regex:
(?<=<h4 class="t-16 t-black t-normal">).*
Will be glad for any kind of advice.
I'd suggest you to use an HTML parsing library like Jsoup in Java or beautifulsoup in Python to parse HTML instead of using regex for this reason
Following is the kind of code that does the job for you,
String s = "<h4 class=\"t-16 t-black t-normal\">\r\n" +
" <span class=\"visually-hidden\">Company Name</span>\r\n" +
" <span class=\"pv-entity__secondary-title\">Taldor</span>\r\n" +
" </h4>";
Document doc = Jsoup.parse(s);
for (Element element : doc.getElementsByClass("pv-entity__secondary-title")) {
System.out.println(element.text());
break;
}
Prints,
Taldor
In worst case, if you are doing some quick and dirty work, you can do this temporary solution using regex but it is surely not recommended thing to do.
<span class="pv-entity__secondary-title">(.*?)<\/span>
Use this regex and capture your data from group1.