I'm trying to convert MS Word tables to HTML via code. I'm in the process of adapting some code given in this answer but I need the resulting HTML table to ultimately be converted to a CALS table format, then merged with an existing XML tree my program generates.
I'm currently working on the conversion from Word table to HTML table part (prior to converting that to CALS) but my problem seems to be a recurring error which says:
hexadecimal value 0x07, is an invalid character
And sure enough, if i look at the resulting HTML from each cell in the table via a messageBox as my program runs, i can see there is a small 'box' after the text from the table cell.
I have tried using something like
string newContent = content.Replace((char)(0x1F), Convert.ToChar(""));
to replace the character, but it complains then that the string must be one character long.
I may be going about things the wrong way in the sense that i'm trying to store HTML inside an XElement. But i don't think this is causing the problem?!
The issue is clearly the little 'box' in the Word table cells, but not sure what it is or how to ignore or remove it.
Here is my code
private void dealWithTables()
{
try
{
foreach (Table tb in doc.Tables)
{
for (int r = 1; r <= tb.Rows.Count; r++)
{
for (int c = 1; c <= tb.Columns.Count; c++)
{
try
{
Cell cell = tb.Cell(r, c);
foreach (Paragraph paragraph in cell.Range.Paragraphs)
{
Tagging2(paragraph.Range, "P", paragraph.Range.Text);
}
Tagging2(cell.Range, "TD");
}
catch (Exception e)
{
if (e.Message.Contains("The requested member of the collection does not exist."))
{
//Most likely a part of a merged cell, so skip over.
}
else throw;
}
}
try
{
Row row = tb.Rows[r];
Tagging2(row.Range, "TR");
}
catch (Exception ex)
{
bool initialTrTagInserted = false;
int columnsIndex = 1;
int columnsCount = tb.Columns.Count;
while (!initialTrTagInserted && columnsIndex <= columnsCount)
{
try
{
Cell cell = tb.Cell
(r, columnsIndex);
//cell.Range.InsertBefore("<TR>");
initialTrTagInserted = true;
}
catch (Exception e)
{
}
columnsIndex++;
}
columnsIndex = tb.Columns.Count;
bool endTrTagInserted = false;
while (!endTrTagInserted && columnsIndex >= 1)
{
try
{
Cell cell = tb.Cell(r, columnsIndex);
//cell.Range.InsertAfter("</TR>");
endTrTagInserted = true;
}
catch (Exception e)
{
}
columnsIndex--;
}
}
}
Tagging2(tb.Range, "Table");
object separator = "";
object nestedTable = true;
tb.ConvertToText(separator, nestedTable);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
XElement tableTree = new XElement("table");
public void Tagging2(Range range, string tagName, string content)
{
string newContent = content.Replace((char)(0x1F), Convert.ToChar(""));
tableTree.Add(new XElement(tagName, newContent));
MessageBox.Show("text of para " + newContent);
}
public void Tagging2(Range range, string tagName)
{
tableTree.Add(new XElement(tagName));
}
Seems you're replacing it with an empty string, thus your error message. Try to replace it with a white space:
content.Replace((char)(0x07), (char)(0x20))