Search code examples
c#openxmlexport-to-excelopenxml-sdk

OOXML SDK illegal character replacements


I am having an issue with creating an XLSX document with the Open XML SDK 2.0 from MS.

My issue is that I need to display some of these illegal characters in the excel sheet, but if I just add them to the sheet, the document will not load.

I am using this function

    private static string ProcessString(string str)
    {
        return System.Security.SecurityElement.Escape(str);
    }

Which will give me Tom's ball instead of Tom's ball. (Well I haven't figured out how to get the latter as the excel generated won't open.)

Anybody know how to make the illegal XML characters show using OOXML in an Excel sheet?

EDIT:
In function I am using to create a text cell is:

private static Cell CreateTextCell(string header, UInt32 index, string text)
{
    var c = new Cell { DataType = CellValues.String, CellReference = header + index };
    var cellValue = new CellValue(text);
    c.Append(cellValue);
    return c;
}

I know it has to do with illegal characters because when I didn't include a particular field in my text it worked, then when I included it, Excel would give me a parser error and a blank document.

The text that I deal with also happens to have HTML tags in it as well.

ps. lol, I just noticed that the markdown used parsed my HTML escape making my example look ridiculous.

edit 2:

Some example of input:

  • Cancer's Complexity: Are we Looking at the Wrong Levels to Develop Effective Interventions?

  • Prospective study of breast cancer risk in mutation-negative women from <i>BRCA1</i> or <i>BRCA2</i> mutation-positive families in the Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFab).

  • Germline <em>BRCA2</em> mutations correlate with aggressive prostate cancer and adverse outcome.

The html formatting is basically so it displays on the web page. I should just strip off the basic formatting tags. But more importantly, I want the excel file to load and escaping the values is a sure way of doing just that.


Solution

  • Are you sure this is what is causing the problem? Can you add "normal" strings to the cells and open it?

    AFAIK the apostrophe character is not an illegal XML character.

    If you look in the OOXML specification in section 22.9.2.19 ST_Xstring (Escaped String) (the data type for strings in cells) you will see the following explanation:

    *22.9.2.19 ST_Xstring (Escaped String) String of characters with support for escaped invalid-XML characters. For all characters which cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format xHHHH, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it must be escaped as x0008. end example]*