Search code examples
javascriptgoogle-chromegoogle-chrome-extensionpdf.js

Chrome extension: How to show custom UI for a PDF file?


I'm trying to write a Google Chrome extension for showing PDF files. As soon as I detect that browser is redirecting to a URL pointing to a PDF file, I want it to stop loading the default PDF viewer, but start showing my UI instead. The UI will use PDF.JS to render the PDF and jQuery-ui to show some other stuff.

Question: how do I make this? It's very important to block the original PDF viewer, because I don't want to double memory consumption by showing two instance of the document. Therefore, I should somehow navigate the tab to my own view.


Solution

  • As the main author of the PDF.js Chrome extension, I can share some insights about the logic behind building a PDF Viewer extension for Chrome.

    How to detect a PDF file?

    In a perfect world, every website would serve PDF files with the standard application/pdf MIME-type. Unfortunately, the real world is not perfect, and in practice there are many websites which use an incorrect MIME-type. You will catch the majority of the cases by selecting requests that satisfy any of the following conditions:

    • The resource is served with the Content-Type: application/pdf response header.
    • The resource is served with the Content-Type: application/octet-stream response header, and its URL contains ".pdf" (case-insensitive).

    Besides that, you also have to detect whether the user wants to view the PDF file or download the PDF file. If you don't care about the distinction, it's easy: Just intercept the request if it matches any of the previous conditions.
    Otherwise (and this is the approach I've taken), you need to check whether the Content-Disposition response header exists and its value starts with "attachment".

    If you want to support PDF downloads (e.g. via your UI), then you need to add the Content-Disposition: attachment response header. If the header already exists, then you have to replace the existing disposition type (e.g. inline) with "attachment". Don't bother with trying to parse the full header value, just strip the first part up to the first semicolon, then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266).

    Which Chrome (extension) APIs should I use to intercept PDF files?

    The chrome.webRequest API can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer that requests the PDF file from the given URL.

    chrome.webRequest.onHeadersReceived.addListener(function(details) {
        if (/* TODO: Detect if it is not a PDF file*/)
            return; // Nope, not a PDF file. Ignore this request.
    
        var viewerUrl = chrome.extension.getURL('viewer.html') +
          '?file=' + encodeURIComponent(details.url);
        return { redirectUrl: viewerUrl };
    }, {
        urls: ["<all_urls>"],
        types: ["main_frame", "sub_frame"]
    }, ["responseHeaders", "blocking"]);
    

    (see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for the actual implementation of the PDF detection using the logic described at the top of this answer)

    With the above code, you can intercept any PDF file on http and https URLs. If you want to view PDF files from the local filesystem and/or ftp, then you need to use the chrome.webRequest.onBeforeRequest event instead of onHeadersReceived. Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view a local PDF file have to explicitly allow this at the extension settings page though.

    On Chrome OS, use the chrome.fileBrowserHandler API to register your extension as a PDF Viewer (https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js).

    The methods based on the webRequest API only work for PDFs in top-level documents and frames, not for PDFs embedded via <object> and <embed>. Although they are rare, I still wanted to support them, so I came up with an unconventional method to detect and load the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files. This method relies on the fact that when an element is put in the document, it eventually have to be rendered. When it is rendered, CSS styles get applied. When I declare an animation for the embed/object elements in the CSS, animation events will be triggered. These events bubble up in the document. I can then add a listener for this event, and replace the content of the object/embed element with an iframe that loads my PDF Viewer.
    There are several ways to replace an element or content, but I've used Shadow DOM to change the displayed content without affecting the DOM in the page.

    Limitations and notes

    The method described here has a few limitations:

    • The PDF file is requested at least two times from the server: First a usual request to get the headers, which gets aborted when the extension redirects to the PDF Viewer. Then another request to request the actual data.
      Consequently, if a file is valid only once, then the PDF cannot be displayed (the first request invalidates the URL and the second request fails).

    • This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension (crbug.com/104058).

    • The method to get PDFs to work for <object> and <embed> elements requires a script to run on every page. I've profiled the code and found that the impact on performance is negligible, but you still need to be careful if you want to change the logic.
      (I first tried to use Mutation Observers for detection, which slowed down the page load by 3-20% on huge documents, and caused an additional 1.5 GB peak in memory usage in a complex DOM benchmark).

    • The method to detect <object> / <embed> tags might still cause any NPAPI/PPAPI-based PDF plugins to load, because it only replaced the <embed>/<object> tag's content when it has already been inserted and rendered. When a tab is inactive, animations are not scheduled, and hence the dispatch of the animation event will significantly be delayed.

    Afterword

    PDF.js is open-source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium. If you browse the source, you'll notice that the code is a bit more complex than I explained here. That's because extensions cannot redirect requests at the onHeadersReceived event until I implemented it a few months ago (crbug.com/280464, Chrome 35).

    And there is also some logic to make the URL in the omnibox look a bit better.

    The PDF.js extension continues to evolve, so unless you want to significantly change the UI of the PDF Viewer, I suggest to ask users to install the PDF.js's official PDF Viewer in the Chrome Web Store, and/or open issues on PDF.js's issue tracker for reasonable feature requests.