Search code examples
web-scrapingbeautifulsoupcdata

How to find the productId present in website(CDATA) using Beautiful Soup


I want to extract productId value(186852001461) from the given script or wherever id present on the website using beautiful soup.

<script type="text/javascript">
 /* <![CDATA[ */
var bv_single_product = {"prodname":"Honey Graham Gelato","productId":"186852001461"};
/* ]]> */
</script>

mycode

import re
import requests
from bs4 import BeautifulSoup
final = "https://www.talentigelato.com/products/honey-graham-gelato"
response = requests.get(final, timeout=35)
soup = BeautifulSoup(response.content, "html.parser") 
s = soup.findAll('script',attrs={'type': 'text/javascript'} )[17]
print(type(s))
html_content = str(s)
html_content = s.prettify()
print(html_content))

Solution

  • You need to use .string and then a regex so you can dump the value to json.loads().

    Here's how:

    import json
    import re
    
    import requests
    from bs4 import BeautifulSoup
    
    final = "https://www.talentigelato.com/products/honey-graham-gelato"
    response = requests.get(final, timeout=35)
    soup = BeautifulSoup(response.content, "html.parser")
    s = soup.findAll('script', attrs={'type': 'text/javascript'})[17]
    data = json.loads(re.search(r"single_product = ({.*})", s.string).group(1))
    print(data["productId"])
    

    Output:

    186852001461