Search code examples
pythonweb-scrapingpdf-generation

How to make a PDF from an online ebook that is displayed page by page?


I would like to save into PDF books like this one to PDF https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html that shows a book page by page.

How to do it?

The only thing that I managed so far is to print page by page into a pdf, and then combine separate pdf pages.

Is there a way to do it automatically in Python or other scripts?


Solution

  • You can download the document images directly with requests and save to PDF with PIL. For example:

    import requests
    from PIL import Image # pip install Pillow
    from io import BytesIO
    
    pdf_path = "doc.pdf"
    url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'
    
    images = [
        Image.open(BytesIO(requests.get(url.format(f'{p:>04}'), verify=False).content))
        for p in range(1, 4)  # <-- increase number of pages here (now it will save first 3 pages)
    ]
    
    # borrowing from this answer: https://stackoverflow.com/a/47283224/10035985
    images[0].save(
        pdf_path, "PDF" ,resolution=100.0, save_all=True, append_images=images[1:]
    )
    

    The resulting doc.pdf opened in Firefox:

    enter image description here