Search code examples
c#visual-studioepub

How to parse an ncx file using c#


I am trying to create a windows phone app for reading e-pubs. I extracted the content and now I want to read the ncx file. But when I try to use System.Xml.Serialization.XmlSerializer it is telling me unknown field in the second line itself. Please help


Solution

  • Here is how the basic approach to read an epub file

    • Treat the EPUB file as a ZIP archive and read it using the Windows built-in ZIP archive reader, ZipArchive
    • In the archive, find the file META-INF/container.xml and look in it
      to find the full-path attribute of the root-file element. That gives you the path to the OPF file (probably something like
    • OPS/content.opf) The 'manifest' element of the OPF file will tell
      you the names of all the files that make up the book. The 'spine'
      element will tell you the order in which they appear in the book (and will include a reference, via the 'toc' attribute of the spine
      element, to a table-of-contents file that will usually be in NCX
      format)
    • Normally, the EPUB book will consist of series of XHTML files, each file containing one 'chapter' of the book. The basic procedure to display a book for reading would be:
      • figure out which chapter the user wants to look at
      • load the XHTML for that chapter into a WebView (or some other solution for rendering XHTML on screen)

    Problems you are likely to encounter:

    • Many EPUB books are created using ZIP-generators that, although compatible with the ZIP standard, are incompatible with the ZIP-reader APIs built into the OS. You will probably need to use a third party library like DotNetZip or SharpZipLib (but be careful of the licence conditions for the latter).

    • You will need to do some work to display images in the WebView, especially if you try to cover all the image types that are part of the EPUB standard.

    • It will be fiddly to find and apply all the CSS styles that the EPUB book defines.

    • You will probably want to display a 'paged' view of the chapter, rather than displaying it as a long vertically scrollable column. That will involve some funky javascript work.

    • You may find that an individual EPUB chapter is too large for displaying in a WebView. In the end, you may decide that all the limitations of WebView mean you will be better off writing your own custom XHTML-parsing rendering solution, and displaying using TextBlocks, or something more exotic (you can use C++ interop code and the D2D Font APIs)