Search code examples
javascriptwebpowerpoint

Best way to read PPTX with javascript


I have been doing some research and I am trying to understand what is the standard way to read a pptx with JavaScript/Typescript in the browser.

A lot of the libraries I have found are mainly for node like textract . I found one library called JS-PPTX but the last commit was made in 2016 so that's not super promising.

Most of the libraries are about creating a Power Point presentation, but what I really need to do is be able to read the file and identify the contents of the slides.

I am happy to read the raw file format and try to parse it if that is better, but I just need a way to upload and read the file with the FileReader Api.

Or if there is a way to convert the pptx to another format that is easier to read I would be into that. One library I found called PPTX2HTML, but this last commit is from 2017.

I found this Stack Overflow post, but it is from 2010 so I am hoping there is an evolution of thought.


Solution

  • PPTX (see the spec here) is a zipped, XML-based file format that is part of the Microsoft Office Open XML (also known as OOXML or OpenXML) specification, introduced as part of Microsoft Office 2007 and later.

    Browsers can parse XML, so you probably have to:

    1. read the file with FileReader,
    2. unzip it somehow
    3. parse it with DOMParser
    4. maybe transform it with XSLT