Search code examples
javascripthtmlsemanticssemantic-markup

Is there an easy way to convert HTML with multiple <br/> tags into proper surrounding <p> tags in Javascript?


Let's say I have a bunch of HTML like below:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Is there an easy way with Javascript to convert it to properly semantic <p> tags? E.g.:

<p>
  bla bla bla long paragraph here
</p>
<p>
  bla bla bla more paragraph text
</p>

Output spacing is not important, ideally it will work with any input spacing.

I'm thinking I might try to cook up a regex, but before I do that I wanted to make sure I was a) avoiding a world of hurt and b) there wasn't something else out there - I'd tried to do a google search but haven't yet come up with anything.

Thanks for any advice!


Solution

  • I got bored. I'm sure there are optimizations / tweaks needed. Uses a little bit of jQuery to do its magic. Worked in FF3. And the answer to your question is that there isnt a very "simple" way :)

    $(function() {
      $.fn.pmaker = function() {
        var brs = 0;
        var nodes = [];
    
        function makeP()
        {
          // only bother doing this if we have nodes to stick into a P
          if (nodes.length) {
            var p = $("<p/>");
            p.insertBefore(nodes[0]);  // insert a new P before the content
            p.append(nodes); // add the children        
            nodes = [];
          }
          brs=0;
        }
    
        this.contents().each(function() {    
          if (this.nodeType == 3) // text node 
          {
            // if the text has non whitespace - reset the BR counter
            if (/\S+/.test(this.data)) {
              nodes.push(this);
              brs = 0;
            }
          } else if (this.nodeType == 1) {
            if (/br/i.test(this.tagName)) {
              if (++brs == 2) {
                $(this).remove(); // remove this BR from the dom
                $(nodes.pop()).remove(); // delete the previous BR from the array and the DOM
                makeP();
              } else {
                nodes.push(this);
              }
            } else if (/^(?:p)$/i.test(this.tagName)) {
              // these tags for the P break but dont scan within
              makeP();
            } else if (/^(?:div)$/i.test(this.tagName)) {
              // force a P break and scan within
              makeP();
              $(this).pmaker();
            } else {
              brs = 0; // some other tag - reset brs.
              nodes.push(this); // add the node 
              // specific nodes to not peek inside of - inline tags
              if (!(/^(?:b|i|strong|em|span|u)$/i.test(this.tagName))) {
                $(this).pmaker(); // peek inside for P needs            
              }
            } 
          } 
        });
        while ((brs--)>0) { // remove any extra BR's at the end
          $(nodes.pop()).remove();
        }
        makeP();
        return this;
      };
    
      // run it against something:
      $(function(){ 
        $("#worker").pmaker();
      });
    

    And this was the html portion I tested against:

    <div id="worker">
    bla bla bla long <b>paragraph</b> here
    <br/>
    <br/>
    bla bla bla more paragraph text
    <br/>
    <br/>
    this text should end up in a P
    <div class='test'>
      and so should this
      <br/>
      <br/>
      and this<br/>without breaking at the single BR
    </div>
    and then we have the a "buggy" clause
    <p>
      fear the real P!
    </p>
    and a trailing br<br/>
    </div>
    

    And the result:

    <div id="worker"><p>
    bla bla bla long <b>paragraph</b> here
    </p>
    <p>
    bla bla bla more paragraph text
    </p>
    <p>
    this text should end up in a P
    </p><div class="test"><p>
      and so should this
      </p>
      <p>
      and this<br/>without breaking at the single BR
    </p></div><p>
    and then we have the a "buggy" clause
    </p><p>
      fear the real P!
    </p><p>
    and a trailing br</p>
    </div>