Search code examples
pythonmarkdown

Load remote markdown file from python


I need to read an .md file with python and store the content somewhere else. When I use this code, I get HTML, but I only need the markdown code.

doc_url = 'https://gitlab.com/quanzhang/cloud-deploy-component-prod/-/blob/main/README.md'  
with urllib.request.urlopen(doc_url) as url:
  text = url.read()

Solution

  • You need to download the raw file:

    import urllib.request
    
    from pathlib import Path
    
    REPO_URL = "https://gitlab.com/quanzhang/cloud-deploy-component-prod"
    
    DOC_URL = f"{REPO_URL}/-/raw/main/README.md"
    
    DOWNLOAD_FOLDER = r"c:\temp"  
    
    with urllib.request.urlopen(DOC_URL) as response:
        with (Path(DOWNLOAD_FOLDER) / Path(DOC_URL).name).open("wb") as file:
            file.write(response.read())
    

    Side note: Depending on your needs consider using requests instead of urllib.