Search code examples
pythonbeautifulsouppython-re

search a var in script tag by bs4 & python


url = "www.xxxx.com"
rlink = requests.get(url, cookies=cookies).content
html = BeautifulSoup(rlink, 'html.parser')
scripttags = html.findAll("script")

In html DOM, it will have about 7x script tags, I need to search a variable (unique) in every script tag

variable is

var playbackUrl = 'https://www.yyyy.com'
for i in range(len(scripttags)):
    if "playbackUrl" in str(scripttags[i]):
        for j in str(scripttags[i]).split("\n"):
            if "playbackUrl" in j:
                url_=re.search("'(.*)'", j).group(1)
                print(url_)

though my script can do the job, however, just wonder if any smart way to do the task


Solution

  • Code can be more readable if you learn to use for-loop without range(len())

    And you don't have to split it into lines

    html = '''<script>
    var other = 'test';
    var playbackUrl = 'https://www.example1.com';
    var next = 'test';
    </script>
    
    <script>
    var other = 'test';
    var playbackUrl = 'https://www.example2.com';
    var next = 'test';
    </script>
    '''
    
    from bs4 import BeautifulSoup
    import re
    
    soup = BeautifulSoup(html, 'html.parser')
    scripttags = soup.find_all("script")
    
    for script in scripttags:
        
        results = re.search("var playbackUrl = '(.*)'", script.text)
        if results:
            print('search:', results[1])
        
        # OR
        
        results = re.findall("var playbackUrl = '(.*)'", script.text)
        if results:
            print('findall:', results[0])    
    

    Result:

    search: https://www.example1.com
    findall: https://www.example1.com
    
    search: https://www.example2.com
    findall: https://www.example2.com