I am using LibXML in my Vala application to parse HTML code. However the HTML I use is invalid if you pass it through validator (although browser displays it normally). In this HTML some tags are not closed, e.g. they use <img>
instead of <img />
and <meta>
instead of <meta/>
. I cannot do anything about it, e.g. ask them to write valid HTML. But I need to parse it and libxml2 fails to do this (in short, doc->get_root_element()
always return null).
Can I do something to make libxml2 parse invalid HTML?
HTML is not XML. People tried to make it XML (it was called XHTML), and we mostly just learned that people can't be trusted to write valid XML. When you say that it is invalid, I assume you mean it is not valid XML but is, in fact, valid HTML.
libxml includes an HTML parser, you need to use that. In Vala everything is in the Html namespace.