Search code examples
javaxmlencodingsaxon

Transform XML to CSV with Saxon = error: Output character 160


I am try to transform from XML (UTF-8 encoding) to CSV (win-1251 encoding) - I get an error

net.sf.saxon.trans.DynamicError: Output character not available in this encoding (decimal 160)

I understand that in the xml text there is a character with code 160 which is not in win-1251.

Tried to clear XML before transformation process, but it doesn't help

        Charset charset = Charset.forName("windows-1251");
        CharsetDecoder decoder = charset.newDecoder();
        CharsetEncoder encoder = charset.newEncoder();
        encoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
        String result = s;

        try {
            ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(s));
            CharBuffer cbuf = decoder.decode(bbuf);
            result = cbuf.toString();
        } catch (CharacterCodingException cce) {
            log.error("Exception during character encoding/decoding: " + cce.getMessage());
        }

Please tell me the best way to solve this problem?

my xsl sample

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE csv-style [
<!ENTITY semicolons     ';;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;'>
<!ENTITY commas         ',,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'>
]>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:output method="text" indent="no" omit-xml-declaration="yes" encoding="windows-1251"/>

<xsl:param name="delim">semicolon</xsl:param>
<xsl:param name="showHead">yes</xsl:param>
<xsl:variable name="delimStr">
    <xsl:choose>
        <xsl:when test="$delim = 'comma'">&commas;</xsl:when>
        <xsl:otherwise>&semicolons;</xsl:otherwise>
    </xsl:choose>
</xsl:variable>

<xsl:template match="blocks">
    <xsl:apply-templates select="*"/>
</xsl:template>

<xsl:template match="description|pair|foot|body/table/head">
<!-- don't do anything just skip it-->
</xsl:template>

<xsl:template match="table">
    <xsl:apply-templates select="table|head|body"/>
</xsl:template>

<xsl:template match="col">
    <xsl:if test="position()=1">
        <xsl:value-of select="substring($delimStr, 1, @id - 1)"/>
    </xsl:if>
<xsl:choose>
    <xsl:when test="@value">
        <xsl:text>&quot;</xsl:text><xsl:variable name="escape">
        <xsl:call-template name="_replace_string">
            <xsl:with-param name="string" select="@value" />
        </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$escape" /><xsl:text>&quot;</xsl:text>

    </xsl:when>
    <xsl:otherwise>
        <xsl:text>""</xsl:text>
        <xsl:apply-templates/>
    </xsl:otherwise>
</xsl:choose>
<xsl:choose>
    <xsl:when test="position()=last()">
        <xsl:value-of select="substring($delimStr, 1, ancestor::table[1]/@colNum - @id)"/>
    </xsl:when>
    <xsl:otherwise>
        <xsl:value-of select="substring($delimStr, 1, following-sibling::col[1]/@id - @id)"/>
    </xsl:otherwise>
</xsl:choose>
</xsl:template> <!-- col -->

<xsl:template match="row">
    <xsl:if test="col[@value][1]">
        <xsl:apply-templates select="col"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:if>
</xsl:template>

<xsl:template match="head">
    <xsl:if test="$showHead = 'yes'">
        <xsl:apply-templates select="*"/>
    </xsl:if>
</xsl:template>

<xsl:template match="body">
    <xsl:apply-templates select="*"/>
</xsl:template>

<xsl:template name="_replace_string">
    <xsl:param name="string" select="''"/>
    <xsl:variable name="find">"</xsl:variable>
    <xsl:variable name="replace">""</xsl:variable>
    <xsl:choose>
        <xsl:when test="contains($string,$find)">
            <xsl:value-of select="concat(substring-before($string,$find),$replace)"/>
            <xsl:call-template name="_replace_string">
                <xsl:with-param name="string" select="substring-after($string,$find)"/>
                <xsl:with-param name="find" select="$find"/>
                <xsl:with-param name="replace" select="$replace"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$string"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

my xml sample

<?xml version="1.0" encoding="UTF-8" ?><blocks type="report"><functions><func num="4" text=" nameOf_10031"></func><func num="5" text="name Of_10071"></func><func num="6" text="name Of_10006"></func></functions><description name="[441] testesttest with 160 "><rows total="44" start="1" end="44" show-data="yes"></rows><columns count="10"><column id="1" type="4" position="1" width="" format="&apos;dd.mm.yyyy&apos;"></column><column id="2" type="4" position="2" width="" format="&apos;dd.mm.yyyy&apos;"></column><column id="3" type="3" position="3" width=""></column><column id="4" type="2" position="4" width=""></column><column id="5" type="2" position="5" width=""></column><column id="6" type="2" position="6" width=""></column><column id="7" type="2" position="7" width=""></column><column id="8" type="2" position="8" width=""></column><column id="9" type="2" position="9" width=""></column><column id="10" type="2" position="10" width=""></column></columns></description><pair name="ReportName" value="test test test "></pair><table colNum="10" id="12561"><head><row><col id="1" value="test test test"></col><col id="2" value=" test test test"></col><col id="3" value="test test test"></col><col id="4" value="test test test"></col><col id="5" value="test test test"></col><col id="6" value="test test test"></col><col id="7" value="test test test"></col><col id="8" value=" test test test"></col><col id="9" value="test test test"></col><col id="10" value="test test test"></col></row></head><body><row num="1"><col id="1" value="01.07.2006"></col><col id="2"></col><col id="3" value="53363"></col><col id="4" value="65187" record-id="65187"></col><col id="5" value="53363" record-id="53368"></col><col id="6" value="test test test" record-id="1974"></col><col id="7"></col><col id="8"></col><col id="9" value="test test test"></col><col id="10"></col></row></body></table></blocks>

when i try

java -cp saxon-9.1.0.8.jar net.sf.saxon.Transform -t -s:myxml.xml -xsl:myxsl.xsl -o:result.csv

i get an same error (160)

Saxon 9.1.0.8J from Saxonica
Java version 1.8.0_333
Warning: at xsl:stylesheet on line 11 column 81 of myxsl.xsl:
  Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 378 milliseconds
Processing file:/D:/111/myxml2.xml
Building tree for file:/D:/111/myxml2.xml using class net.sf.saxon.tinytree.TinyBuilder
Tree built in 4 milliseconds
Tree size: 46 nodes, 0 characters, 99 attributes
Loading net.sf.saxon.event.MessageEmitter
Error at xsl:value-of on line 46 of myxsl.xsl:
  Output character not available in this encoding (decimal 160)
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#66)
     processing /blocks/table[1]/head[1]/row[1]/col[2]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#73)
     processing /blocks/table[1]/head[1]/row[1]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#32)
     processing /blocks/table[1]/head[1]
  at xsl:apply-templates (file:/D:/111/myxsl.xsl#24)
     processing /blocks/table[1]
  in built-in template rule
Transformation failed: Run-time errors were reported

When I use a newer version, for example Saxon-HE-10.3.jar, there are no problems, but unfortunately I can't upgrade to it


Solution

  • A character map mapping e.g the non-breaking space 160 to a normal space 32 would be

      <xsl:character-map name="m1">
        <xsl:output-character character="&#160;" string=" "/>
      </xsl:character-map>
    
      <xsl:output use-character-maps="m1"/>
    

    Character maps are supported since XSLT 2 and Saxon 8.9 I think was the first version to implement the 2.0 standard so 9.1 should cover that.