Search code examples
pythonhtmlweb-scrapingbeautifulsoup

BeautifulSoup - Select String Based on Dictionary Key


I am using BeautifulSoup to scrape an HTML page and looking to select a string based on an array key not an element tag.

In this case I am looking to use "fmt_headline" as the key to grab "Founder and CEO at SolarThermoChemical LLC".

<div id="srp_main_" class="">
<code id="voltron_srp_main-content" style="display:none;">

"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",
"isConnectedEnabled":true,
"sharedConnectionToken":"240506fce660"

</div>

Any thoughts on how to do this?


Solution

  • Once you've parsed your HTML with BeautifulSoup, it can give you all the text:

    2>>> x
    '<div id="srp_main_" class="">\n<code id="voltron_srp_main-content" style="display:none;">\n\n"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",\n"isConnectedEnabled":true,\n"sharedConnectionToken":"240506fce660"\n\n</div>'
    2>>> soup=bs4.BeautifulSoup(x)
    2>>> y=soup.get_text()
    2>>> y
    u'\n\n\n"fmt_headline":"Founder and CEO at SolarThermoChemical LLC",\n"isConnectedEnabled":true,\n"sharedConnectionToken":"240506fce660"\n\n'
    

    Now, further analysis of that text is left to other tools, such as regular expressions:

    2>>> import re
    2>>> mo = re.search(r'"fmt_headline":"([^"]*)"', y)
    2>>> print(mo.group(1))
    Founder and CEO at SolarThermoChemical LLC