Search code examples
pythonweb-scrapingbeautifulsoupsession-cookies

Extract data from script tag [HTML] using BeautifulSoup in Python


I want to Extract data from a variable which is inside of a script:

<script>
var Itemlist = 'null';
var ItemData = '[{\"item_id\":\"107\",\"id\":\"79\",\"line_item_no\":\"1\",\"Amount\":\"99999.00\"}]';
</script>

I want the item_id and the Amount inside of a variable in python

I tried using regex it worked for a while but when the cookies session updated it stopped working

Is there any other way to get those values??

I am using this method to get the script from the html but it changes when the cookie session updates

soup = bs(response.content, 'html.parser')
script = soup.find('script')[8]

so i have to change the number that i've put after ('script') for now it's [8] if cookies session updates i have to keep changing the number until i find the script i am looking for


Solution

  • To get the data from the <script> you can use this example:

    import re
    import json
    from bs4 import BeautifulSoup
    
    html_data = """
    <script>
    var Itemlist = 'null';
    var ItemData = '[{\"item_id\":\"107\",\"id\":\"79\",\"line_item_no\":\"1\",\"Amount\":\"99999.00\"}]';
    </script>
    """
    
    soup = BeautifulSoup(html_data, "html.parser")
    data = soup.select_one("script").text
    data = re.search(r"ItemData = '(.*)';", data).group(1)
    data = json.loads(data)
    
    print("Item_id =", data[0]["item_id"], "Amount =", data[0]["Amount"])
    

    Prints:

    Item_id = 107 Amount = 99999.00