Search code examples

Is there something like a "CSS selector" or XPath grep?

I need to find all places in a bunch of HTML files, that lie in following structure (CSS):

div.a ul.b

or XPath:


grep doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.


  • Try this:

    1. Install
      • Ubuntu: aptitude install html-xml-utils
      • MacOS: brew install html-xml-utils
    2. Save a web page (call it filename.html).
    3. Run: hxnormalize -l 240 -x filename.html | hxselect -s '\n' -c ""

    Where "" is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script named cssgrep:

    # Ignore errors, write the results to standard output.
    hxnormalize -l 240 -x $1 2>/dev/null | hxselect -s '\n' -c "$2"

    You can then run:

    cssgrep filename.html ""

    This will generate the content for all HTML label elements of the class black.

    The -l 240 argument is important to avoid parsing line-breaks in the output. For example if <label class="black">Text to \nextract</label> is the input, then -l 240 will reformat the HTML to <label class="black">Text to extract</label>, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.

    See also: