I would like to save into PDF books like this one to PDF https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html that shows a book page by page.
How to do it?
The only thing that I managed so far is to print page by page into a pdf, and then combine separate pdf pages.
Is there a way to do it automatically in Python or other scripts?
You can download the document images directly with requests
and save to PDF with PIL
. For example:
import requests
from PIL import Image # pip install Pillow
from io import BytesIO
pdf_path = "doc.pdf"
url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'
images = [
Image.open(BytesIO(requests.get(url.format(f'{p:>04}'), verify=False).content))
for p in range(1, 4) # <-- increase number of pages here (now it will save first 3 pages)
]
# borrowing from this answer: https://stackoverflow.com/a/47283224/10035985
images[0].save(
pdf_path, "PDF" ,resolution=100.0, save_all=True, append_images=images[1:]
)
The resulting doc.pdf
opened in Firefox: