Search code examples
clibxml2

how to use libxml2 to parse dirty html in C programing


The html maybe dirty such as premature end of data in tag

How can i do it? Thanks


Solution

  • Using the libxml2 HTML parser it will normalize "dirty" HTML into a normalized tree. see htmlDocPtr htmlParseFile(const char * filename, const char * encoding)

    http://xmlsoft.org/html/libxml-HTMLparser.html