Search code examples
tcltdom

How to interactively use tDOM?


I feel that I'm missing something subtle here.

I have a $doc which I can see with $doc asText really contains the content of the page to be parsed. It came from dom parse -html5 $body.

From here, I'd like to interactively explore the DOM. For example, to get a list of anchors. It seems like $doc selectNodes {//a} would work*, but that doesn't return anything. Neither does anything else I try with selectNodes (/head, /body, /html ...nothing!). I can see that there are childNodes so the structure seems to be intact.

What is the better way to explore these nodes so I can figure out what is going wrong?


Solution

  • You can simplify your life, this time, as you seem to work with HTML (not XML, or XHTML for that matter) because you pass -html5 to dom parse, and you select for HTML elements (anchors).

    So far, HTML has no meaning of namespaces, so you may ignore them. Use the -ignorexmlns flag to dom parse.

    % package req tdom
    0.9.2
    % set someHTML {<!DOCTYPE html>
    <html>
      <head>
        <meta charset="UTF-8">
        <title>Title of the document</title></head><body>
        <svg width="100" height="100">
          <circle cx="50" cy="50" r="40" stroke="green" stroke-width="4" fill="yellow" />
        </svg>
      </body>
    </html>}
    % set doc [dom parse -html5 -ignorexmlns $someHTML]
    

    This way, you will be able to run your XPath queries, expressions w/o namespace awareness:

    $doc selectNodes {//svg}
    

    Note that is a recommended use of tDOM:

    Since this probably isn't wanted by a lot of users and adds only burden for no good in a lot of use cases -html5 can be combined with -ignorexmlns, in which case all nodes and attributes in the DOM tree are not in an XML namespace.