Search code examples
javascriptgoogle-chromeuserscripts

Dynamically group poorly-structured HTML, that has no IDs?


There is a very old website I use and the data is not displayed in a friendly fashion. I would like to write a userscript (javascript/jQuery) that assists the readability of this site. The content looks like (the HTML comments are my own, to help show this):

<font size="3" face="Courier">
  <br>
  <!-- Begin entry 1 -->
  Name1 (Location1) - Date1:
  <br>
  Text1
  <br>
  Text1 (continued)
  <br>
  Text1 (continued)
  <br>
  <br>
  <!-- Begin entry 2 -->
  Name2 (Location2) - Date2:
  <br>
  Text2
  <br>
  Text2 (continued)
  <br>
  <br>
  Text2 (continued)
  <br>
  Text2 (continued)
  <br>
  <br>
  <!-- Begin entry 3 -->
  Name3 (Location3) - Date3:
  <br>
  Text3
  <br>
  Text3 (continued)
  <br>
  Text3 (continued)
  <br>
  <br>
  <br>
  Text3 (continued)
  <br>
  Text3 (continued)
  <!-- Below is Text3, but a user copied Entry1 here --> 
  Name1 (Location1) - Date1: <!-- text3 -->
  <br> <!-- text3 -->
  Text1 <!-- text3 -->
  <br> <!-- text3 -->
  Text1 (continued) <!-- text3 -->
  <br> <!-- text3 -->
  Text1 (continued) <!-- text3 -->
  <br>
  <br>
  <!-- Begin entry 4 -->
  Name4 ...
  ......
</font>
  • Example of names: Bob Dole, SMITH,JOHN
  • Example of locations: via Web, INTERNAL
  • Example of dates: Jul 25, 2011 - 1317 EDT, Dec 30, 2011 - 1411 EST
  • Example of Text1/Text2/etc: Blah blah * (test) text goes here -Thanks Here: there

As you can see, two <br> always come before the next "entry" (name, location, date) but since the text is free text it can also contain various <br> including 2 or more. Another issue is if the text also contains Name (Location) - Date pasted from say another entry elsewhere.

So if I wanted to write a script that could be added to Google Chrome where it say added a button that would collapse (or uncollapse if already collapsed) each entry, is that possible? The issue I'm having is that since there is no unique element starting or ending an entry, I'm not sure how to begin this.

The general concept is to loop through each "entry" (header being name/location/date) and the text that follows that up until the next header) and allow each "entry" to be collapsible (such as Reddit comments are collapsible).

Or for a more simple concept, what if I wanted to mark every other entry with red font? So then all of entry1 would be black font, entry2 would be red font, entry3 would be black font, entry4 would be red font, and so on.


Solution

  • For this kind of thing, parse the entries in a state-machine loop.

    The following code was always the first answer to:

    1. Group the HTML as specified in the question.
    2. Provide click control to expand/contract the groupings.
    3. Collapse entries to start -- for better initial overview.

    See a demo of it at jsFiddle.

    UPDATE:

    The question's HTML did not match the actual page structure. Updated the script below to account for that, and also added the CSS to the script-code:

    var containerNode       = document.querySelector ("p font xpre");
    var contentNodes        = containerNode.childNodes;
    var tempContainer       = document.createElement ("div");
    var groupingContainer   = null;
    var hidableDiv          = null;
    var bInEntry            = false;
    var bPrevNodeWasBr      = false;
    
    for (var J = 0, numKids = contentNodes.length;  J < numKids;  ++J) {
        var node            = contentNodes[J];
    
        //--- Is the node an entry start?
        if (    node.nodeType === Node.TEXT_NODE
            &&  bPrevNodeWasBr
            &&  /^\s*\w.*\s\(.+?\)\s+-\s+\w.+?:\s*$/.test (node.textContent)
        ) {
            //--- End the previous grouping, if any and start a new one.
            if (bInEntry) {
                groupingContainer.appendChild (hidableDiv);
                tempContainer.appendChild (groupingContainer);
            }
            else
                bInEntry        = true;
    
            groupingContainer   = document.createElement ("div");
            groupingContainer.className = "groupingDiv";
    
            /*--- Put the entry header in a special <span> to allow for
                expand/contract functionality.
            */
            var controlSpan         = document.createElement ("span");
            controlSpan.className   = "expandCollapse";
            controlSpan.textContent = node.textContent;
            groupingContainer.appendChild (controlSpan);
    
            //--- Since we can't style text nodes, put everythin in this sub-wrapper.
            hidableDiv          = document.createElement ("div");
        }
        else if (bInEntry) {
            //--- Put a copy of the current node to the latest grouping container.
            hidableDiv.appendChild (node.cloneNode(false) );
        }
    
        if (    node.nodeType === Node.ELEMENT_NODE
            &&  node.nodeName === "BR"
        ) {
            bPrevNodeWasBr  = true;
        }
        else
            bPrevNodeWasBr  = false;
    }
    
    //--- Finish up the last entry, if any.
    if (bInEntry) {
        groupingContainer.appendChild (hidableDiv);
        tempContainer.appendChild (groupingContainer);
    }
    
    /*--- If we have done any grouping, replace the original container contents
        with our collection of grouped nodes.
    */
    if (numKids) {
        while (containerNode.hasChildNodes() ) {
            containerNode.removeChild (containerNode.firstChild);
        }
    
        while (tempContainer.hasChildNodes() ) {
            containerNode.appendChild (tempContainer.firstChild);
        }
    }
    
    //--- Initially collapse all sections and make the control spans clickable.
    var entryGroups         = document.querySelectorAll ("div.groupingDiv span.expandCollapse");
    for (var J = entryGroups.length - 1;  J >= 0;  --J) {
        ExpandCollapse (entryGroups[J]);
    
        entryGroups[J].addEventListener ("click", ExpandCollapse, false);
    }
    
    
    //--- Add the CSS styles that make this work well...
    addStyleSheet ( "                                                   \
        div.groupingDiv {                                               \
            border:         1px solid blue;                             \
            margin:         1ex;                                        \
            padding:        1ex;                                        \
        }                                                               \
        span.expandCollapse {                                           \
            background:     lime;                                       \
            cursor:         pointer;                                    \
        }                                                               \
        div.groupingDiv     span.expandCollapse:before {                \
            content:        '-';                                        \
            background:     white;                                      \
            font-weight:    bolder;                                     \
            font-size:      150%;                                       \
            padding:        0 1ex 0 0;                                  \
        }                                                               \
        div.groupingDiv     span.expandCollapse.collapsed:before {      \
            content:        '+';                                        \
        }                                                               \
    " );
    
    
    //--- Functions used...
    function ExpandCollapse (eventOrNode) {
        var controlSpan;
        if (typeof eventOrNode.target == 'undefined')
            controlSpan     = eventOrNode;
        else
            controlSpan     = eventOrNode.target;
    
        //--- Is it currently expanded or contracted?
        var bHidden;
        if (/\bcollapsed\b/.test (controlSpan.className) ) {
            bHidden         = true;
            controlSpan.className = controlSpan.className.replace (/\s*collapsed\s*/, "");
        }
        else {
            bHidden         = false;
            controlSpan.className += " collapsed";
        }
    
        //--- Now expand or collapse the matching group.
        var hidableDiv      = controlSpan.parentNode.children[1];
        hidableDiv.style.display    = bHidden ? "" : "none";
    }
    
    
    function addStyleSheet (text) {
        var D                   = document;
        var styleNode           = D.createElement ('style');
        styleNode.type          = "text/css";
        styleNode.textContent   = text;
    
        var targ = D.getElementsByTagName ('head')[0] || D.body || D.documentElement;
        //--- Don't error check here. if DOM not available, should throw error.
        targ.appendChild (styleNode);
    }
    

    If nested/quoted entries are to be wrapped separately, you will also need to recurse. For nested/quoted entries, open a new question after this one is answered.

    Note: The new sample HTML has multiple pairs of <html> tags and 2 sets of entries! This is probably a cut-and-paste error, but if it is not, open a new question if help is needed for the easy mod to process multiple sets.