I need to read an .md file with python and store the content somewhere else. When I use this code, I get HTML, but I only need the markdown code.
doc_url = 'https://gitlab.com/quanzhang/cloud-deploy-component-prod/-/blob/main/README.md'
with urllib.request.urlopen(doc_url) as url:
text = url.read()
You need to download the raw file:
import urllib.request
from pathlib import Path
REPO_URL = "https://gitlab.com/quanzhang/cloud-deploy-component-prod"
DOC_URL = f"{REPO_URL}/-/raw/main/README.md"
DOWNLOAD_FOLDER = r"c:\temp"
with urllib.request.urlopen(DOC_URL) as response:
with (Path(DOWNLOAD_FOLDER) / Path(DOC_URL).name).open("wb") as file:
file.write(response.read())
Side note: Depending on your needs consider using requests
instead of urllib
.