I want to find all TEXT or HREF matching a RegExp within an HMTL document and wrap those with a tag (e.g. turning plain text into links).
Consider the following HTML:
<body>
<!-- test1 <div>test2 <a href="test3">test4</a></div> -->
test5
<a href="test6">notest</a>
<div>
test8
<p>
test9 notest test10
<a href="notest">test12</a>
<input type="text" name="test13">test14</input>
</p>
test15
</div>
</body>
Then this would be my required replacement:
<body>
<!-- test1 <div>test2 <a href="test3">test4</a></div> -->
<div class="wrapped">test5</div>
<div class="wrapped"><a href="test6">notest</a></div>
<div>
<div class="wrapped">test8</div>
<p>
<div class="wrapped">test9</div> notest
<div class="wrapped">test10</div>
<div class="wrapped"><a href="notest">test12</a></div>
<input type="text" name="test13">test14</input>
</p>
<div class="wrapped">test15</div>
</div>
</body>
Notice that tests 5, 6, 8, 9, 10, 12, 15
got wrapped.
It is not acceptable to insert into input boxes or any other special HTML tags that are not displayed (e.g. <script>
<doctype>
and so on).
I was working with a stack principle before:
Push body onto stack.
e = stack.pop()
.
Push all children of e
of type element onto stack, except links (<a>
nodes) and elements of class="wrapped"
.
Check all remaining e.children
of type link for a matching href
or text and wrap.
Wrap all innermost matches within all e.children
of type text.
If stack is not empty, then go to 2.
Complete
The JavaScript is only required to run on Firefox 8.
I would like to accomplish the wrappings without a tree traversal, linear would be optimal
Why do you not want any tree traversal? I think your current algorithm is as good as it gets.
The problem is that the DOM does not offer any sophisticated method to get all text nodes.
I didn't run any performance tests, but this one may have about the same speed:
nodes := getElementsByTagName('*')
excludes := document.querySelectorAll('a, a *, .wrapped, .wrapped *, script, style, input, textarea [, ...]')
querySelectorAll
should perform pretty well)targets := nodes - excludes
targets
<a>
elements separately