Search code examples
htmlxpathscreen-scrapinglxml

Lxml cssselect wildcard


How do I get all the wildcard elements using cssselect?

For example:

content = """
<table>
<tr id='Awesome1234'><a href="link1"></a></tr>
<tr id='Awesome5678'><a href="link2"></a></tr>
</table>
"""
doc = lxml.html.fromstring(html)
links = lxml.cssselection('tr.Awesome* a')
for link in links:
    print link.get('href')

I want it to output:

 link1
 link2

Is this possible with cssselect? If not, how can I get this? (xpath?)


Solution

  • Use the following XPath expression (no css is required):

    tr[starts-with(@id, 'Awesome')]
    

    This XPath expression selects all tr children of the context node that have an id attribute, whose string value starts with the string 'Awsome'.