I'm writing code that get the content of Docx file as HTML by using open XML power tools and now I want to convert it back to another docx file. the step that gets contents as HTML works fine but when I generate the docx file from that HTML the file cannot be opened and throws this error
this file was created in a pre-release version of word 2007 and cannot be opened in this version
the HTML generated from test docx is
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta
charset="UTF-8" />
<title>My Page Title</title>
<meta
name="Generator"
content="PowerTools for Open XML" />
<style>span { white-space: pre-wrap; }
p.pt-Normal {
line-height: 107.9%;
margin-bottom: 8pt;
text-align: justify;
font-family: ;
font-size: 11pt;
margin-top: 0;
margin-left: 0;
margin-right: 0;
}
span.pt-DefaultParagraphFont {
font-family: ;
font-size: 11pt;
font-style: normal;
font-weight: normal;
margin: 0;
padding: 0;
}
span.pt-DefaultParagraphFont-000000 {
font-family: Calibri;
font-size: 11pt;
font-style: normal;
font-weight: normal;
margin: 0;
padding: 0;
}
</style>
</head>
<body>
<div>
<p
dir="rtl"
class="pt-Normal">‏<span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏با سلام خدمت ‏</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏<<‏</span><span
class="pt-DefaultParagraphFont-000000">‎PERSONS.lname‎</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏>>‏</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏ ‏</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏<<‏</span><span
class="pt-DefaultParagraphFont-000000">‎PERSONS.fname‎</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏>>‏</span></p>
<p
dir="rtl"
class="pt-Normal">‏<span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏مدیر محترم ‏</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏<<‏</span><span
class="pt-DefaultParagraphFont-000000">‎OFFICE.name‎</span><span
lang="fa-IR"
class="pt-DefaultParagraphFont">‏>>‏</span></p>
</div>
</body>
</html>
and my code to save the above html as docx is
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Create(dest_doc_path, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
string htmlcontent = htmlTXT.Text;
using (Stream stream = mainPart.GetStream())
{
byte[] buf = (new UTF8Encoding()).GetBytes(htmlcontent);
stream.Write(buf, 0, buf.Length);
}
MessageBox.Show("DONE", "done", MessageBoxButton.OK);
}
The answer is simple. You must not insert HTML content into the MainDocumentPart
because it is expected to contain a valid Open XML w:document
element, e.g., as the following simplified one:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Hello, world!</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
The error message probably is a little misleading. HTML is simply invalid in this case.
Depending on whether or not you changed the HTML after creating it (with the Open XML PowerTools) from the original Word document, you will have to either transform it back into valid Open XML markup (if you changed it) or simply use the Open XML markup from the original Word document.