Search code examples
c#xmlxsltxslt-1.0xml-1.1

XSLT 1.0 Output Hex 0x1C - 0x1F to Text File


I am using xslt version 1.0 to transform an XML file into a text file that is sent to a third party. The third party format requires data fields to be separated with 0x1F (ascii unit separator), groups to be separated with 0x1D (ascii group separator) and records separated with 0x1E (ascii record separator). Use of these within the stylesheet results in the error below.

Character ' ', hexadecimal value 0x1D is illegal in XML documents.

I am currently using 0x80 through 0x82 from the extended char set, then running the transformation result through a replace function in c# to replace the values I used with the ones I actually needed, but it seems like there should be a better, more efficient way to do this.

Is there a way to output these values to a text file using the style sheet directly?

Current Stylesheet

<?xml version="1.0" encoding="us-ascii"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:asap="http://www.asapnet.org/pmp/4.2/exchange"
                xmlns:asap-code="http://www.asapnet.org/pmp/4.2/extension/code"
                xmlns:asap-ext="http://www.asapnet.org/pmp/4.2/extension"
                xmlns:asap-meta="http://www.asapnet.org/pmp/4.2/extension/meta"
                xmlns:nc="http://release.niem.gov/niem/niem-core/3.0/"
                exclude-result-prefixes="asap asap-code asap-ext asap-meta nc">

  <xsl:output method="text" omit-xml-declaration="yes" indent="no" />

  <xsl:variable name="FieldSeparator" select="'&#127;'"/>
  <xsl:variable name="SegmentTerminator" select="'&#128;'"/>


  <!--MAIN-->
  <xsl:template match="asap:ReportTransmission">
    <xsl:apply-templates select="asap-meta:TransactionHeader"/>
    <xsl:apply-templates select="asap-meta:InformationSource"/>
    <xsl:apply-templates select="asap-ext:ReportingPharmacy"/>
  </xsl:template>


  <!--TRANSACTION HEADER - TH SEGMENT-->
  <xsl:template match="asap-meta:TransactionHeader">
    <xsl:value-of select="concat(
                  'TH',
                  $FieldSeparator,
                  asap-meta:ReleaseNumberText,
                  $FieldSeparator,
                  asap-meta:ControlNumberText,
                  $FieldSeparator,
                  asap-code:TransactionKindCode,
                  $FieldSeparator,
                  concat(substring(asap-meta:TransactionDate,1,4),substring(asap-meta:TransactionDate,6,2),substring(asap-meta:TransactionDate,9,2)),
                  $FieldSeparator,
                  concat(substring(asap-meta:TransactionTime,1,2),substring(asap-meta:TransactionTime,4,2)),
                  $FieldSeparator,
                  asap-code:FileKindCode,
                  $FieldSeparator,
                  asap-meta:RoutingNumber,
                  $FieldSeparator,
                  $SegmentTerminator,
                  $SegmentTerminator)" />
  </xsl:template>


  <!--INFORMATION SOURCE - IS SEGMENT-->
  <xsl:template match="asap-meta:InformationSource">
        <xsl:value-of select="concat(
                  'IS',
                  $FieldSeparator,
                  nc:Identification/nc:IdentificationID,
                  $FieldSeparator,
                  nc:Identification/nc:IdentificationJurisdiction/nc:JurisdictionText,
                  $FieldSeparator,
                  nc:MessageText,
                  $SegmentTerminator)" />

  </xsl:template>
</xsl:stylesheet>

(... style sheet continues with additional segments ... )

Current Output (Notepad++)

enter image description here

(... output continues with additional segments ... )

XML Sample

<?xml version="1.0" encoding="UTF-8"?>
<asap:ReportTransmission xmlns:asap="http://www.asapnet.org/pmp/4.2/exchange"
 xmlns:asap-code="http://www.asapnet.org/pmp/4.2/extension/code"
 xmlns:asap-ext="http://www.asapnet.org/pmp/4.2/extension"
 xmlns:asap-meta="http://www.asapnet.org/pmp/4.2/extension/meta"
 xmlns:nc="http://release.niem.gov/niem/niem-core/3.0/" 
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.asapnet.org/pmp/4.2/exchange ../schemas/exchange/pmp_exchange.xsd">
    <asap-meta:TransactionHeader>
        <asap-meta:ReleaseNumberText>4.2</asap-meta:ReleaseNumberText>
        <asap-meta:ControlNumberText>857463</asap-meta:ControlNumberText>
        <asap-code:TransactionKindCode>01</asap-code:TransactionKindCode>
        <asap-meta:TransactionDate>2009-10-15</asap-meta:TransactionDate>
        <asap-meta:TransactionTime>10:45:00</asap-meta:TransactionTime>
        <asap-code:FileKindCode>P</asap-code:FileKindCode>
    </asap-meta:TransactionHeader>
    <asap-meta:InformationSource>
        <nc:Identification>
            <nc:IdentificationID>7564</nc:IdentificationID>
            <nc:IdentificationJurisdiction>
                <nc:JurisdictionText>ACME PHARMACY</nc:JurisdictionText>
            </nc:IdentificationJurisdiction>
        </nc:Identification>
    </asap-meta:InformationSource>
    <asap-ext:ReportingPharmacy>
        <asap-ext:NPIIdentification>
            <nc:IdentificationID>1234567890</nc:IdentificationID>
        </asap-ext:NPIIdentification>
        <asap-ext:PatientInfo>
            <nc:PersonBirthDate>
                <nc:Date>1950-01-01</nc:Date>
            </nc:PersonBirthDate>
            <nc:PersonName>
                <nc:PersonGivenName>John</nc:PersonGivenName>
                <nc:PersonSurName>Smith</nc:PersonSurName>
            </nc:PersonName>
            <nc:PersonSexText>Male</nc:PersonSexText>
            <asap-ext:PrimaryIdentification>
                <nc:PersonLicenseIdentification>
                    <nc:IdentificationID>987544</nc:IdentificationID>
                    <nc:IdentificationJurisdiction>
                        <nc:LocationStateUSPostalServiceCode>MA</nc:LocationStateUSPostalServiceCode>
                    </nc:IdentificationJurisdiction>
                </nc:PersonLicenseIdentification>
            </asap-ext:PrimaryIdentification>
            <nc:ContactMailingAddress>
                <nc:LocationStreet>
                    <nc:StreetName>1234 Main St</nc:StreetName>
                </nc:LocationStreet>
                <nc:LocationCityName>Somewhere</nc:LocationCityName>
                <nc:LocationStateUSPostalServiceCode>MA</nc:LocationStateUSPostalServiceCode>
                <nc:LocationPostalCode>54356</nc:LocationPostalCode>
            </nc:ContactMailingAddress>
            <asap-ext:DispensingRecord>
                <asap-code:ReportingStatusCode>00</asap-code:ReportingStatusCode>
                <asap-ext:Prescription>
                    <asap-ext:PrescriptionNumberText>6542984</asap-ext:PrescriptionNumberText>
                    <asap-ext:PrescriptionWrittenDate>
                        <nc:Date>2009-10-15</nc:Date>
                    </asap-ext:PrescriptionWrittenDate>
                    <asap-ext:PrescriptionRefillQuantity>0</asap-ext:PrescriptionRefillQuantity>
                    <asap-ext:ProductIdentification>
                        <nc:IdentificationID>57866707401</nc:IdentificationID>
                        <asap-code:ProductIdentifierKindCode>01</asap-code:ProductIdentifierKindCode>
                    </asap-ext:ProductIdentification>
                    <asap-ext:PrescriptionSupplyQuantity>15</asap-ext:PrescriptionSupplyQuantity>
                </asap-ext:Prescription>
                <asap-ext:Transaction>
                    <asap-ext:PrescriptionFilledDate>
                        <nc:Date>2009-10-15</nc:Date>
                    </asap-ext:PrescriptionFilledDate>
                    <asap-ext:PrescriptionRefillNumber>0</asap-ext:PrescriptionRefillNumber>
                    <asap-ext:PrescriptionDispensedQuantity>30</asap-ext:PrescriptionDispensedQuantity>
                </asap-ext:Transaction>
                <asap-ext:Prescriber>
                    <asap-ext:DEAIdentification>
                        <nc:IdentificationID>AW8765432</nc:IdentificationID>
                    </asap-ext:DEAIdentification>
                </asap-ext:Prescriber>          
                <asap-ext:AdditionalInformation>
                    <asap-ext:IssuingPrescriptionBlankIdentification>
                        <nc:IdentificationID>787456493993</nc:IdentificationID>
                        <nc:IdentificationJurisdiction>
                            <nc:LocationStateUSPostalServiceCode>MA</nc:LocationStateUSPostalServiceCode>
                        </nc:IdentificationJurisdiction>
                    </asap-ext:IssuingPrescriptionBlankIdentification>
                </asap-ext:AdditionalInformation>
            </asap-ext:DispensingRecord>
        </asap-ext:PatientInfo>
    </asap-ext:ReportingPharmacy>
</asap:ReportTransmission>

Update


For those who may be looking for a similar solution, I ended up going with a C# script within the stylesheet.

  <msxsl:script implements-prefix="CSharpScripts" language="C#">
    public string FS()
    {
    return '\u001F'.ToString();
    }

    public string GS()
    {
    return '\u001D'.ToString();
    }
  </msxsl:script>

It can then be used like this:

<xsl:value-of select="CSharpScripts:FS()"/>

You do need to set EnableScript = true using XsltSettings when loading the XslCompiledTransform, and set CheckCharacters = false on the XmlWriter being used for output:

            var xslt = new XslCompiledTransform();
            xslt.Load(
                    @"E:\TFS\Transforms\TestTransform.xslt",
                    new XsltSettings() {EnableScript = true}, null);

            var writerSettings = xslt.OutputSettings.Clone();
            writerSettings.CheckCharacters = false;

            var sb = new StringBuilder();

            var xmlOutput = XmlWriter.Create(sb, writerSettings);

            xslt.Transform(@"E:\samples.xml", xmlOutput);

Thanks to @Abel for pointing me in the right direction.


Solution

  • You seem to be one of the very few out there that have a sensible requirement for using XML 1.1. Indeed, as you have have found out, with XML 1.0 it is not possible to use control characters below 0x20, except for tab, cr and lf. Since XSLT is written in XML this means that not you will need a processor that can read an XSLT instance document from XML 1.1.

    As far as I know there is only one XSLT 1.0 processor capable of handling XML 1.1 and that is Saxon 6.5 (or a higher version of Saxon, but then you could just as well skip to using XSLT 2.0 or 3.0). An IKVM port for .NET of Saxon exists and is supported (and no, I am not affiliated, in fact, I wrote Exselt, but we have no plans yet to support XML 1.1).

    You don't need to change your input into XML 1.1, only your stylesheet, because that is the place where you need to use these characters.

    Within a proper XML editor that is capable of dealing with XML 1.1, change the following:

    <?xml version="1.0" encoding="UTF-8"?>
    

    into

    <?xml version="1.1" encoding="UTF-8"?>
    

    Then change your separators to use the characters you want them to use:

    <xsl:variable name="FieldSeparator" select="'&#x1F;'" />
    <xsl:variable name="SegmentTerminator" select="'&#x1D;'" />
    

    The error should then be gone (if you still have an error, you are not using a processor capable of dealing with XML 1.1, i.e., in .NET, you are stuck with XML 1.0 and Microsoft has no plans of upgrading as the "use in the wild" of XML 1.1 is very, very small).

    Other alternatives are:

    • Use an extension function that can write an encoded character. In .NET this is quite trivial to do, however, I do not know whether returning an ASCII control character will be accepted by the XML writer.
    • Use the new EXPath binary module, but it is quite new and I am not sure what the level op support is. However, it works with any XML or XSLT version
    • Post-process your output (as you are doing now). Best to use a Unicode Private Use character, as chances on collisions are then next to nothing.
    • (You may be tempted to use xsl:character-maps or codepoints-to-string() with XSLT 2.0, but you'll run into the same issue, only at a later stage.)

    PS: setting omit-xml-declaration="yes" and indent="no" are redundant, text output will never have an xml declaration nor will it provide auto-indentation.

    PPS: the example XSLT you provided dumps a lot of text in places that don't fit your description. Adding a shallow-skip template solves it, but outputs only one line. I didn't check whether that is as intended.