Search code examples
javascriptpythonweb-scrapingbeautifulsouphtml-parsing

How to fetch update binary URL from Microsoft Update Catalog web-page?


I'm trying to fetch download URL of update binaries from Microsoft Update Catalog web-page. The download button taking me to a new window on which I have the target binary URL is present.

How can I fetch the binary URL by parsing the catalog web page.

I tried the following way

import urllib.request

def main():
    url = 'https://catalog.update.microsoft.com/v7/site/Search.aspx?q=KB3205400'
    offlinePage = 'catalog.html'
    print(url)
    sourceWebPage(url, offlinePage)

def sourceWebPage(url, offlinePage):
    request=urllib.request.Request(url,None,headers)
    response = urllib.request.urlopen(request)
    data = response.read()
    with open(offlinePage, 'wb') as f:
        f.write(data)

if __name__ == '__main__':
    main()

But the saved HTML source file is not having any link to the target binary URL.


Solution

  • <a id="431bdad0-e68b-4275-8f14-e9c90fa2a9b0_link" href="javascript:void(0);" onclick="goToDetails(&quot;431bdad0-e68b-4275-8f14-e9c90fa2a9b0&quot;);">
    

    The download pop-up window is generated by JavaScript, you can not use requests or urllib to handle JavaScript. I recommend you use selenium..