Search code examples
xslthtmltidy

Remove duplicate xml header


html Tidy gives this as output for some reason:

<?xml version="1.0" encoding="utf-16"?>
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 11 February 2007), see www.w3.org" />
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5" />

...rest of document

So there are 2 xml headers, and of the wrong type (not UTF-8). Is there a way to remove the 2nd header, change it to UTF-8, and also remove the DOCTYPE with XSL?


Solution

  • I think that it would be better to fix the original problem. Do you use the HTML Tidy library?

    Try setting output-encoding to utf8 and add-xml-decl to false. The DOCTYPE node can be suppressed by setting the doctype property to omit.