Search code examples
c++htmlcss-selectorsjsoup

Jsoup like html parser for C++


I have been writing some codes to get some data from some pages in Java and Jsoup was on of the best libraries to work with. But, Unfortunately I have to port the whole code to C/C++. But I a cannot find any decent html parser to use on c++. Is there any Jsoup like library for C++ or How can similar results be achieved?

[Currently I am using Curl to get the source of the pages and roaming the internet to find a html parser]


Solution

  • Unfortunately, i guess there's no parser like Jsoup for C++ ...

    Beside the libraries which are already mentioned here, there's a good overview about C++ (some C too) parser here: Free C or C++ XML Parser Libraries

    For parsing i used TinyXML-2 for (Html-) DOM parsing; it's a very small (only 2 files) library that runs on most OS (even non-desktop).

    LibXml

    • push and pull parser (DOM, SAX)
    • Validation
    • XPath and XPointer support
    • Cross-Plattform / good documentation

    Apache Xerxces

    • push and pull parser (DOM, SAX)
    • Validation
    • No XPath support (but a package for this?)
    • Cross-Plattform / good documentation

    If you are on C++ CLI, check out NSoup - a Jsoup port for .NET.

    Some more:

    Maybe you can combine a DOM Model / Parser and a CSS selector together?