I am using BeautifulSoup to scrape an HTML page and looking to select a string based on an array key not an element tag.
In this case I am looking to use "fmt_headline" as the key to grab "Founder and CEO at SolarThermoChemical LLC".
<div id="srp_main_" class="">
<code id="voltron_srp_main-content" style="display:none;">
"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",
"isConnectedEnabled":true,
"sharedConnectionToken":"240506fce660"
</div>
Any thoughts on how to do this?
Once you've parsed your HTML with BeautifulSoup, it can give you all the text:
2>>> x
'<div id="srp_main_" class="">\n<code id="voltron_srp_main-content" style="display:none;">\n\n"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",\n"isConnectedEnabled":true,\n"sharedConnectionToken":"240506fce660"\n\n</div>'
2>>> soup=bs4.BeautifulSoup(x)
2>>> y=soup.get_text()
2>>> y
u'\n\n\n"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",\n"isConnectedEnabled":true,\n"sharedConnectionToken":"240506fce660"\n\n'
Now, further analysis of that text is left to other tools, such as regular expressions:
2>>> import re
2>>> mo = re.search(r'"fmt_headline":"([^"]*)"', y)
2>>> print(mo.group(1))
Founder and CEO at SolarThermoChemical LLC