Search code examples
javascripthtmlreplacequotes

Dumb quotes into smart quotes only for text not HTML code


I’m transforming dumb quotes into smart quotes in a contenteditable but the problem is that it also replaces them inside HTML elements like:

<a href=“something” title=“something”

thus making them invalid. I want to only do it for user’s text. Here’s the catch. I have to keep the original formatting elements, so I can’t do something like:

clean($('#something_container').text());

This would remove all HTML elements (formatting) when returned. Here’s the code that I have:

content = clean($('#post_content').html());
$('#post_content').html(content);

// replaces ", ', --, <div> with <p>
function clean(html) {
  html = html.replace(/'\b/g, "\u2018")  // opening singles
         .replace(/\b'/g, "\u2019")  // closing singles
         .replace(/"\b/g, "\u201c")  // opening doubles
         .replace(/\b"/g, "\u201d")  // closing doubles
         .replace(/--/g,  "\u2014") // em-dashes
         .replace(/<div>/g, "<p>")  //<div> to <p>
         .replace(/<\/div>/g, "</p>"); //</div> to </p>
  return html;
};

What would be the best (most efficient) way to replace dumb quotes only in user’s text and skip the HTML tags like <img src="" />? Thanks!


Solution

  • Here’s a possible approach (don’t know about efficiency, but if you only handle strings typed in by users by hand, they probably won’t be very long, so it shouldn’t matter):

    1. split your string into non-overlapping chunks: HTML tags vs. the rest
    2. “educate quotes” only in the non-tags, leaving the tags alone
    3. put the string back together

    If the HTML you’re dealing with is well-formed (in particular, if there’s no "<" floating around), the splitting into chunks is easy:

    var html   = '<p style="color:red">some "quotes" in here</p>'
    var chunks = html.match(/(<.+?>|[^<]+)/g)
    // returns Array: ['<p style="color:red">', 'some "quotes" in here', '</p>']
    

    Then, given your clean() function that handles the replacements, you can say:

    cleaned = chunks.map(function(chunk){
      return /</.test(chunk) ? chunk : clean(chunk)
    }).join('');
    

    to apply your replacements anywhere except between < and >.