Search code examples
pythonpython-3.xazureazure-devopswget

How can I extract Azure IP ranges json file programatically through python?


I want to download the ipranges.json (which is updated weekly) from https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 I have this python code which keeps running forever.

import wget
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
response = wget.download(URL, "ips.json")
print(response)

How can I download the JSON file in Python?


Solution

  • Because https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 is the link which automatically trigger javascript to download, therefore you just download the page, not the file

    If you check downloaded file, the source will look like this

    enter image description here

    We realize the file will change after a while, so we have to scrape it in generic way

    For convenience, I will not use wget, 2 libraries here are requests to request page and download file, beaufitulsoup to parse html

    # pip install requests
    # pip install bs4
    import requests
    from bs4 import BeautifulSoup
    
    # request page
    URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
    page = requests.get(URL)
    
    # parse HTML to get the real link
    soup = BeautifulSoup(page.content, "html.parser")
    link = soup.find('a', {'data-bi-containername':'download retry'})['href']
    
    # download
    file_download = requests.get(link)
    
    # save in azure_ips.json
    open("azure_ips.json", "wb").write(file_download.content)