Search code examples
pythonunicodelibxml2

Unicode in Python libxml2


I have an issue I'm trying to test a search feature within an xhtml document. The search should support Arabic and English text. I'm new to python and libxml2 so I have trouble figuring out how to do it.

I always get an empty result with Arabic text (in English it works perfectly), even though online tools such as http://www.freeformatter.com/xpath-tester.html#ad-output return the exact result I need.

import libxml2

doc = libxml2.parseFile("content.xhtml")

ctxt = doc.xpathNewContext()

xPathQuery = "//*[contains(text(), 'تجربة')]"

res = ctxt.xpathEval(xPathQuery)

doc.freeDoc()
ctxt.xpathFreeContext()

also using a Unicode string didn't work:

xPathQuery = u"//*[contains(text(), 'تجربة')]"

or even:

xPathQuery = u"//*[contains(text(), 'تجربة')]"
res = ctxt.xpathEval(xPathQuery.encode('utf-8'))

Solution

  • It turned out to be an issue with the code file encoding itself, I saved it in Unicode and it worked.