Search code examples
pythonazure-pipelinespython-webbrowserazure-pipelines-yamlazure-pipelines-tasks

Download file from a URL in Azure Pipelines Microsoft hosted agent


I'm running a Python script task in Azure YAML pipeline. A JSON file gets downloaded when the URL is accessed via a browser. URL - https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519

What I've done so far -->

- task: PythonScript@0
  name: pythonTask
  inputs:
    scriptSource: 'inline'
    script: |

      url = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"

      import webbrowser
      webbrowser.open(url)
      print("The web browser opened and the file is downloaded")

Once the browser opens URL, a file should get downloaded locally, automatically. However, on running the above pipeline, I cant seem to find the file anywhere on the agent machine. I do not get any errors as well.

I'm using a Windows-2019 Microsoft Hosted Agent.

How can I find the downloaded file-path inside the agent machine?

Or is there another way I can download file from the URL without actually having to open the browser?


Solution

  • How can I find the downloaded file-path inside the agent machine?

    Please try the following Python script:

    steps:
    - task: PythonScript@0
      displayName: 'Run a Python script'
      inputs:
        scriptSource: inline
        script: |
         import urllib.request
         
    
         
         url = 'https://www.some_url.com/downloads'
         
         path = r"$(Build.ArtifactStagingDirectory)/filename.xx"
         urllib.request.urlretrieve(url, path)
    

    Or

    steps:
    - script: 'pip install wget'
      displayName: 'Command Line Script'
    
    - task: PythonScript@0
      displayName: 'Run a Python script'
      inputs:
        scriptSource: inline
        script: |
         import wget
         
         print('Beginning file download with wget module')
         
         url = 'https://www.some_url.com/downloads'
         path = r"$(Build.ArtifactStagingDirectory)"
         wget.download(url, path)
    

    Then the file will be downloaded to the target path in Python script.

    Here is a blog about use Python download files from url

    Update:

    The url: microsoft.com/en-us/download/confirmation.aspx?id=56519 needs to open the web page and the file will automatic downloaded.

    So when you use the wget or urllib.request, you will get the 403 error.

    You could change to use the site url to manually download the json file.

    enter image description here

    For example: url: https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json

    import urllib.request
    
    url = 'https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json'
    
    path = r"$(Build.ArtifactStagingDirectory)\agent.json"
    urllib.request.urlretrieve(url, path)
    

    Update2:

    You could use Python script to get the download in the website.

    Sample:

    steps:
    - script: |
       pip install bs4
       
       pip install lxml
      workingDirectory: '$(build.sourcesdirectory)'
      displayName: 'Command Line Script'
    
    - task: PythonScript@0
      displayName: 'Run a Python script'
      inputs:
        scriptSource: inline
        script: |
         from bs4 import BeautifulSoup
         from urllib.request import Request, urlopen
         import re
         import urllib.request
         
         
         req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
         html_page = urlopen(req).read()
         
         a=""
         soup = BeautifulSoup(html_page, "lxml")
         
         for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
             
              a= link.get('href')
              print(a)
         
         
         
         path = r"$(Build.sourcesdirectory)\agent.json"
         urllib.request.urlretrieve(a, path)
    

    Result:

    enter image description here

    Update3:

    Another method to get the download URL:

    steps:
    - script: 'pip install requests'
      displayName: 'Command Line Script'
    
    - task: PythonScript@0
      displayName: 'Run a Python script'
      inputs:
        scriptSource: inline
        script: |
         import requests
         import re
         import urllib.request
         
         rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
          
         t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
          
         
         
         a= t.group()
         
         print(a)
         
         path = r"$(Build.sourcesdirectory)\agent.json"
         urllib.request.urlretrieve(a, path)