Search code examples
xslt-2.0google-search-appliance

Is this substring-after and concat breaking my encoding?


Our GSA uses a FileConnector to index different shares which are targets of DFS Links. I am trying to rewrite file://filesrv01.example.com/share$/dir/file.ext to file://R:/hare/dir/file.ext in the frontend XSL.

There is a xsl:choose element wich tests for different protocols but not file://, so I assume the default handling for my source links would be this node:

<xsl:otherwise>
  <xsl:value-of disable-output-escaping='yes' select="U"/>
</xsl:otherwise>

We created a new xsl:when node like this:

<xsl:when test="starts-with(U, 'file://server.example.com/share$>
  <xsl:value-of disable-output-escaping='yes'
    select="concat('file://R:/share/',
      substring-after(U,'file://server.example.com/share$/') )"/>
</xsl:when>

This works for almost all entries in our index, but it fails when the path contains a german umlaut. Following input, actual and expected Output:

file://server/share$/dir/FileWithUmläut.txt
file://R:/share/dir/FileWithUmläut.txt
file://R:/share/dir/FileWithUmläut.txt

Why is the default xsl:otherwise working without changing umlauts but our concat+substring is not? Anything I could check or change?

Edit #1 There is only one output element in the XSL file: <xsl:output method="html"/>. The XSL itself is recognised as ANSI in Notepad++ with some Umlauts in UI texts. Output to the browser is utf-8 xhtml.

Edit #2 When I replace the xsl:when with the following block, the encoding is not broken and the link can be opened (not using the DFS root but directly using unc). Because of this I believe it is not the encoding of XML or XSL, thanks for your input nevertheless, @MathiasMüller.

<xsl:when test="starts-with(U, 'file://server.example.com/share$/')">
  <xsl:value-of disable-output-escaping='yes' select="U"/>
</xsl:when>

Solution

  • My specific problem vanished as soon as I used file:///R:/ instead of file://R:/ (additional forward slash) but I still try to figure out why that helped. In the GSA XSL there is a JavaScript snippet to "fix" encoding issues in IE but that does not care if the protocol has 2 or 3 slashes.

    Although Firefox does not allow the file protocol out of the box, neither syntax works when copied from there. This leads me to believe that my currently installed IE 9 fixes some encoding issues on its own when using the correct file:/// prefix and Firefox does not.

    As we would like the links to work in Firefox too, I will continue my quest for glory in the land of unicode, plagued by the ancient dragon of file:/// and home to the houses of IE and FF.