Search code examples
pythonazurevirtual-machinethrottlingdownload-speed

Slow download speed on Azure VM when installing Python library


Disclaimer: I have read other similar SO questions but none of them applies to this case1,2.

I am trying to install a Python library in my Azure Ubuntu VM. The library is PaddleOCR. Its installation instructions require you to run:

python3 -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple

Which downloads and installs an ~800Mb whl file.

This completes very fast on my local machine, which proves it's not a source availability throttling issue:

Local machine download

On the Azure VM, I get a variable initial download speed between 150-450kb/s, which quickly (within 1-2 minutes) goes down to ~25kb/s. This makes the download fail with a Timeout after some time, and I can never get to download the file. If you re-run the command, the initial download speed always varies between 150-400kb/s, and invariably it decreases down to about ~25kb/s after some time.

Here are some screenshots of re-runs on my Azure VM, which I manually cancel after a bit just to test:

VM download

Azure VM specifications

  • Non-spot NC4as-T4-v3 instance with a Premium SSD
  • It's idle -- it's not used for anything else than downloading
  • It sits in the UK South region, physically the same location where my local machine from where I can successfully download is based (London, UK).
  • ... let me know if I should specify more info here.

What I tried

  1. Shutting down and restarting the VM
  2. Redeploy the VM
  3. Disabling IPv6, which sometimes seems linked to a general slowness of pip download speed.
  4. Ran sudo apt-get update && sudo apt-get upgrade as suggested here
  5. Updated conda and pip

Possible explanation

The fact that the speed is initially "faster" (150-400kb/s) then it slows down (25kb/s) seems to me quite telling of a throttling being made by Azure. However, I cannot find any reference online about this. Is there some setting I can change to avoid this?


1: How to improve download speed on Azure VMs?

2: Slow download speed from Azure Storage


Solution

  • The issue is with pip, not Azure throttling the speed.

    tl;dr

    Removing the -i argument downloads and installs the library quickly and correctly, i.e.:

    python3 -m pip install paddlepaddle-gpu
    

    How I got there

    • I read here that the -i argument can be used to restrict pip to install releases that are directly hosted. Apparently, this is sometimes beneficial to avoid slow speed issues, because it can happen that pip crawls a lot of pages looking for package sdists (whatever that means).
    • However, the official installation instructions of PaddleOCR already explicitly mention that you should use the -i argument as instructed.
    • I ended up trying the opposite, i.e. removing the -i argument. For some reason, that worked. Without that argument, my Azure VM was able to download and install PaddleOCR quickly and I got it working without any issue.

    What really threw me off was the fact that on my local machine, I could get a successful quick download/installation even with the -i argument. So it seemed like a problem with Azure, but it wasn't.

    I still don't understand the root cause of this, but at least I got it working. If anyone has an idea, please leave a comment.