Search code examples
xpathunicodeunicode-stringunicode-literals

Unicode character do not return the correct results


This command works as expected and return 1 node.

# cat myfile.txt
<feed>
<entry>
<author>
<name>Amar joshi</name>
</author>
</entry>
</feed>

# xpath -e "/feed/entry[author/name='Amar joshi']" myfile.txt
Found 1 nodes in myfile.txt:

But this does not.

<feed>
<entry>
<author>
<name>संतोष गोरे</name>
</author>
</entry>
</feed>

xpath -e "/feed/entry[author/name='संतोष गोरे']"  myfile.txt

The file and command are very similar. The unicode text should have no problem. I have checked it using the utility that I found here...

http://xpather.com/


Solution

  • This is probably a bug in the Perl module XML::XPath which the xpath utility is part of. It seems that command-line arguments aren't properly decoded from UTF-8. It might work to run

    PERL5OPT=-CA xpath -e "/feed/entry[author/name='संतोष गोरे']"  myfile.txt