Search code examples
pythonxpathcaselxmlcase-insensitive

python lxml: case insensitive xpath tag name matching


I'm using python + lxml to parse a spss file.

There seems to be many threads on this topic but the answers don't particular help me.

The answers I have come across:

- lower-case the entire input before parsing; 
- if you know the complete list of tags in advance

For me these suggestion would take too much time.

Instead I would like to match strings only when necessary.

Here is the line of code I would like to edit:

xpath("//definition//variable[@name='"+tag_name+"']")

How can I get a hit if tag_name is:

tag_name = "Q1top"
tag_name = "q1Top"
tag_name = "q1TOP"
etc

I'm guessing some form of regex would be in order???


Solution

  • Alternatively, you can incorporate regex from http://exslt.org/regular-expressions namespace in the XPath, for example :

    ns = {"re": "http://exslt.org/regular-expressions"}
    query = "//definition//variable[re:test(@name, '^{0}$', 'i')]".format(tag_name)
    result = tree.xpath(query, namespaces=ns)