Tackling the problem of annotating HTML markup and storing marks in the markup itself, the <mark>
element came across as a tentative solution. Marking inline:
<p>The fox <mark>jumped over</mark> the lazy dog.</p>
I want to extend this idea towards marking (highlighting) arbitrary pieces of text in the document. Unfortunately, following this approach to mark, say, across paragraphs, would generate invalid HTML (<mark>
expects phrasing content) and possibly break the DOM hierarchy:
<mark><p>Red Green Blue.</p> <p>Magenta, Cyan,</mark> Black</p>
Although a smart parser might translate the above into:
<p><mark>Red Green Blue.</mark></p> <p><mark>Magenta, Cyan,</mark> Black</p>
it doesn't preserve the fact there was a single mark spanning a paragraph and a fragment of a second paragraph, not two marks!
What is the best, possibly semantic way of doing this without breaking the DOM hierarchy? I seek to query this data through DOM/JS APIs.
The only viable solution, markup-wise, is to follow the example of your smart parser. Assuming the highlighting information (or rather, the information that the original highlight spanned across paragraphs) is only required for machine use, one can then append a custom data attribute grouping these separate mark
elements:
<p><mark data-mark-group="1">Red Green Blue.</mark></p> <p><mark data-mark-group="1">Magenta, Cyan,</mark> Black</p>
This information could also potentially be used with some sort of JS to indicate highlights across paragraphs in style as well, but I'll leave that as an exercise to the reader.