Search code examples
encodingmarkdownxslt-2.0saxonxpath-2.0

Encoding of input file for XSLT 2.0 function unparsed-text()


Let's say I have this file.md encoded in UTF-8 (.md means it's markdown format)

Hello world
This text is encoded in UTF-8.

Then I approach it using function unparsed-text('file.md', 'UTF-8'). That works like a charm.

Problem shows up when (let's say) I use one of my native language (Czech) specific character, for example this file2.md:

Hello world
This character "š" is read like "sh" in english.  

Using same encoding parameter in unparsed-text() I get error:

XTDE1200: Failed to read input file file:/C:/file2.md (java.nio.charset.MalformedInputException): Input length = 1

file2.md has same encoding UTF-8 as file.md, czech characters are in this charset, yet XSLT processor doesn't accept it. If I change encoding parameter to windows-1250 ie. unparsed-text('file2.md', 'windows-1250') it works nicely.

So question is, why I get this error? Does it relate to the fact that input file is with extension .md (.txt works). Is there way around it? I really want to be able to use same encoding in my xsl stylesheet as supplied input file has.

Thanks for answers.


Solution

  • As Martin says, the evidence you have provided suggests that the file is encoded in Windows-1252, and that unparsed-text('file.md', 'utf-8') is therefore right to reject it.