Search code examples
htmlnode.jspdfpuppeteerheadless-browser

Is it possible to use Puppeteer to convert PDF to HTML?


I know that it's possible the other way(HTML to PDF), but can it be done the other way?

I didn't find any documentation regarding this.


Solution

  • No, Puppeteer cannot be used on converting PDF to HTML. According to its website:

    Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol.

    That is, Puppeteer is a headless Chrome. As Chrome cannot convert PDF to HTML (please correct me if I'm wrong), neither can Puppeteer.

    However, you can use other npm modules, such as pdf-parse to parse the text content of PDF, and generating the HTML by yourself. Or, just use npm modules such as pdf2html to convert pdf directly.