I'm trying to download a file using:
wget https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt
I'm expecting to get the .txt file, however, I get the page html instead.
I tried wget --max-redirect=2 --trust-server-names <url>
based on the suggestions here and wget -m <url>
which downloads the entire website, and a few other variations that also don't work.
wget https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt
This point wget to HTML page even though it has .txt suffix. After visting it I found there is link to text file itself under raw, which you should be able to use with wget following way
wget https://huggingface.co/distilbert-base-uncased/raw/main/vocab.txt
If you need to reveal true type of file without downloading it you might use --spider
option, in this case
wget --spider https://huggingface.co/distilbert-base-uncased/blob/main/vocab.txt
gives output containing
Length: 7889527 (7,5M) [text/html]
and
wget --spider https://huggingface.co/distilbert-base-uncased/raw/main/vocab.txt
gives output containing
Length: 231508 (226K) [text/plain]