Search code examples
rubyxmlxpathepub

Parse EPUB container with Ruby and LibXNL


I have Ruby code which is intended to look inside an extracted EPUB file, find the location of the OPF metadata file and return it. The path to the OPF file (relative to the root of the EPUB) is written to an XML file found in META-INF/container.xml. The file content is as follows:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
      <rootfile full-path="content.opf" media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

I'm using LibXML and XPath to extract the root file path. The problem is that LibXML reports that my XPath expression is invalid. The same expression works when using Python and LXML. The relevant portion of my code is below.

require 'libxml'
include LibXML
container = File.join("META-INF", "container.xml")
tree = XML::Document.file(container)
rootfile = tree.find_first("//{urn:oasis:names:tc:opendocument:xmlns:container}rootfile")['full-path']

Any suggestions would be most welcome.


Solution

  • It is likely that the way LibXML handles default namespaces is different from lxml. Try to define an alias (i.e. a prefix) for the namespace.

    require 'libxml'
    include LibXML
    container = File.join("META-INF", "container.xml")
    tree = XML::Document.file(container)
    tree.root.namespaces.default_prefix = 'opf'
    rootfile = tree.find_first("//opf:rootfile")['full-path']
    

    Alternatively, use find_first with a second argument, containing namespace declarations:

    require 'libxml'
    include LibXML
    container = File.join("META-INF", "container.xml")
    tree = XML::Document.file(container)
    rootfile = tree.find_first("//opf:rootfile", "opf:urn:oasis:names:tc:opendocument:xmlns:container)['full-path']
    

    But then you need to know this namespace in advance and hard-code it. Find more info on working with namespaces here.