I'm using OpenXml Power tools in my project to convert a document (docx) into html, using the code already provided with this sdk it produces an elegant duplicate in html form.(Github link : https://github.com/OfficeDev/Open-Xml-PowerTools/blob/vNext/OpenXmlPowerToolsExamples/HtmlConverter01/HtmlConverter01.cs )
However looking at the html markup, the html has embedded styling.
Is there any way of turning this off and using plain and simple <h1>
and <p>
tags ?
I would like to know this embedded styling as the formatting would be taken care of by bootstrap.
The embedded styling is as follows :
<p dir="ltr" style="font-family: Calibri;font-size: 11pt;line-height: 115.0%;margin-bottom: 0;margin-left: 0;margin-right: 0;margin-top: 0;">
<span xml:space="preserve" style="font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"> </span>
</p>
This as you can see is fine if you want a direct copy, but not if you want to control the style yourself.
In the C# code i have already made the following ajustments :
Many thanks.
If you can also the XmlReader
and XmlWriter
to obtain a bare bone html. This could however be a little overkill, as only the tag itself and its text content will be kept.
public static class HtmlHelper
{
/// <summary>
/// Keep only the openning and closing tag, and text content from the html
/// </summary>
public static string CleanUp(string html)
{
var output = new StringBuilder();
using (var reader = XmlReader.Create(new StringReader(html)))
{
var settings = new XmlWriterSettings() { Indent = true, OmitXmlDeclaration = true };
using (var writer = XmlWriter.Create(output, settings))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
writer.WriteStartElement(reader.Name);
break;
case XmlNodeType.Text:
writer.WriteString(reader.Value);
break;
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
}
}
}
return output.ToString();
}
}
Resulting output :
<p>
<span></span>
</p>