Combine a bunch of PDFs converted from TIFF files as they're read in thru a loop

I've got a Python web scraper that crawls thru a bunch of TIFF pages online and converts each to PDF but I can't figure out how to combine all the converted PDFs into one and write it to my computer.

import img2pdf, requests
outPDF = []

for pgNum in range(1,20):
    tiff = requests.get("http://url-to-tiff-file.com/page="+str(pgNum)).content
    pdf = img2pdf.convert(tiff)
    outPDF.append(pdf)

with open("file","wb") as f:
    f.write(''.join(outPDF))

I get the following error when I run it:

f.write(''.join(outPDF))
TypeError: sequence item 0: expected str instance, bytes found

Update

If you go to http://oris.co.palm-beach.fl.us/or_web1/details_img.asp?doc_id=23543456&pg_num=1, then open up a web dev console in your browser, you can see a form tag with a bunch of ".tif" URLs in a bunch of hidden input tags.

Solution

img2pdf has some quirkiness when it comes to converting TIFF and PNG files. The code solves some of the potential issues within your code, because it uses Pillow to reformat the image files for processing with img2pdf

import img2pdf
from PIL import Image

image_list = []
test_images = ['image_01.tiff', 'image_02.tiff', 'image_03.tiff']
for image in test_images:
   im = Image.open(f'{image}').convert('RGB')
   im.save(f'mod_{image}')
   image_list.append(f'mod_{image}')

with open('test.pdf', 'wb') as f:
   letter = (img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11))
   layout = img2pdf.get_layout_fun(letter)
   f.write(img2pdf.convert(image_list, layout_fun=layout))

I modified your code to use my code above, but I cannot test it, because I don't know what website that you're querying. So please let me know if something fails and I will troubleshoot it.

import img2pdf
import requests
from PIL import Image
from io import BytesIO

outPDF = []

for pgNum in range(1,20):
   tiff = requests.get("http://url-to-tiff-file.com/page="+str(pgNum)).content
   im = Image.open(BytesIO(tiff).convert('RGB')
   im.save(tiff)
   outPDF.append(tiff)

with open("file.pdf","wb") as f:
   letter = (img2pdf.in_to_pt(8.5), img2pdf.in_to_pt(11))
   layout = img2pdf.get_layout_fun(letter)
   f.write(img2pdf.convert(outPDF, layout_fun=layout))

UPDATED ANSWER

After you provided the actual URL for the target website, I determined that the best way to accomplish your task was to go another route. Based on your use case you wanted the PDF file that was being produced from all the hidden TIFF files. The source website will generate the PDF without downloading all those TIFF files.

Here is the code to get that generated PDF and download it to your system.

import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

capabilities = DesiredCapabilities().CHROME

chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")

download_directory = os.path.abspath('chrome_pdf_downloads')

prefs = {"download.default_directory": download_directory,
     "download.prompt_for_download": False,
     "download.directory_upgrade": True,
     "plugins.always_open_pdf_externally": True}

chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)

url_main = 'http://oris.co.palm-beach.fl.us/or_web1/details_img.asp? doc_id=23543456&pg_num=1'

driver.get(url_main)
WebDriverWait(driver, 60)
driver.find_element_by_xpath("//input[@name='button' and @onclick='javascript:ValidateAndSubmit(this.form)']").submit()

If you still want to get the TIFF files, please let me know and I will look into downloading and processing them to produce the PDF file that the code above is obtaining.