I'm trying to extract some data from a given XML file. Therefore, I have to select some specific nodes by their attribute values. My XML looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<svg ....>
....
<g font-family="'BentonSans Medium'" font-size="12">
<text>bla bla bla</text>
....
</g>
....
</svg>
I've tried to escape the apostrophs in the value but I couldn't get it working.
from lxml import etree as ET
tree = ET.parse("file.svg")
root = tree.getroot()
xPath = ".//g[@font-family=''BentonSans Medium']"
print(root.findall(xPath))
I always get errors of this kind:
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 214, in prepare_predicate
raise SyntaxError("invalid predicate")
Anyone got ideas how to select these nodes with XPath?
Try this:
xPath = ".//g[@font-family=\"'BentonSans Medium'\"]"
Your code fails because you haven't put the closing single quote:
xPath = ".//g[@font-family=''BentonSans Medium']"
It should be after the last '
:
xPath = ".//g[@font-family=''BentonSans Medium'']"
But it doesn't make the XPath expression correct, as '
is interpreted just as is.
By the way, if you want to check if the font-family
contains the given string, use contains()
XPath function with the xpath
method:
xPath = '//g[contains(@font-family, "BentonSans Medium")]'
print(root.xpath(xPath))
Output
[<Element g at 0x7f2093612108>]
The sample code fetches all g
elements with font-family
attribute values containing BentonSans Medium
string.
I don't know why the findall
method doesn't work with contains()
, but the xpath
seems more flexible, and I would recommend using this method instead.