How to wget a file without getting the html instead?

I'm trying to download a file using:

wget https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt

I'm expecting to get the .txt file, however, I get the page html instead.

I tried wget --max-redirect=2 --trust-server-names <url> based on the suggestions here and wget -m <url> which downloads the entire website, and a few other variations that also don't work.

Solution

wget https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt

This point wget to HTML page even though it has .txt suffix. After visting it I found there is link to text file itself under raw, which you should be able to use with wget following way

wget https://huggingface.co/distilbert-base-uncased/raw/main/vocab.txt

If you need to reveal true type of file without downloading it you might use --spider option, in this case

wget --spider https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt

gives output containing

Length: 7889527 (7,5M) [text/html]

and

wget --spider https://huggingface.co/distilbert-base-uncased/raw/main/vocab.txt

gives output containing

Length: 231508 (226K) [text/plain]