Search code examples
htmlxpathsyntaxcommand-line-interfacexmlstarlet

what is the xpath syntax to grab html tag elements?


how do I print the title value for the below html file using xmlstarlet?

thufir@doge:~/.html$ 
thufir@doge:~/.html$ xmlstarlet sel -t -v "/html/header[@name='title']" -n hello.html 

thufir@doge:~/.html$ 
thufir@doge:~/.html$ cat hello.html 
<html>
<header><title>This is title</title></header>
<body>
Hello world
</body>
</html>
thufir@doge:~/.html$ 

Grabbing xml might be a bit different than html? Assuming garden-variety html and not xhtml.

The reason I'm using xmlstarlet is specifically to use xpath syntax which seems rather alien.


Solution

  • With:

    "/html/header[@name='title']"

    you select an header element which has an attribute name with the value "title".

    What you want is to grab a title element in an header element:

    //header/title

    or just use :

    //title

    which selects all title elements, regardless of its position in the tree.