Search code examples
javascriptpythonbeautifulsoupaiohttp

Scraping a script written in JS with BS4


I am currently working on a script that signs up for an account on a site using BS4 and aiohttp. One thing that is needed in the POST is something called tagInfo. Part of the tagInfo is this thing called "tmxSessionId". Normally I would have used BS4 like I normally do and scrape the value every time. However the problem is that tmxSessionId is part of a script that is being used by the site and it is in JS. I will post a bit of the script here:

<script type="text/javascript">

(function() {
var Context = raptor.require('ebay.context.Context');
    var langCode = "en-US";
    var emailAutoCompleteEnabled = true;

    var dfpContext = '{"tmxSessionId":"081708da1660ab61a9e69761fffcb25e"}';

}

I editied the script and removed most of all the extra bits provide a bit of context as well as the part that I'm curious about. So as you can see it is in a script tag. So as a test I tried to do

soup = BeautifulSoup(r.text, 'lxml')
idd = soup.find('script', type='text/javascript')

it came back with <script type="text/javascript">var layer = false;</script> and not the complete script. So how can I parse for tmxSessionId?


Solution

  • I have gone through the same situation and found a shortcut by myself which helped me in every same scenario. You need to go for that shortcut.

    scripts = soup.find_all('script')
    your_script = [script for script in scripts if 'tmxSessionId' in str(script)][0]
    print(your_script)
    

    the list comprehension part will find the script element with the text you need. And mostly the first element will be your script. So I have added [0] in the end.

    Hope this helps! Cheers!