Search code examples
javascripttypographypunctuation

HTML: Prevent line breaks before some punctuation characters preceeded by a whitespace


In French typography conventions, few punctuation characters like ;,: or ? must be preceeded by a white space. This is causing unwanted line breaks when the punction is at the border of its HTML container.

Voulez-vous coucher avec moi ce soir ? Non, merci.

becomes

Voulez-vous coucher avec moi ce soir 
? Non, merci.

The objective is to get it like this:

Voulez-vous coucher avec moi ce 
soir ? Oui, bien sûr.

I didn't find any way to avoid line breaks before those characters using CSS. Moreover:

  • Manually adding   before every occurence is not an option due to existing data and ergonomy constraints.
  • Using CSS whitespace: nowrap; is not an option because it must happen in all other cases.

I'm considering using a global javascript function running a str.replace(" ?", " ?"); but I can't figure out few things:

  • How and when triggering it
  • How to select the appropriate tags
  • How to replace innerText only (ie not properties content)

Any advice?

Note: Whitespace before some punctuation characters in French: is there a CSS way to avoid lines breaking? is sharing the same objective but has a pure CSS constraint that I don't have here.


Solution

  • Ideally, you'd do the replacement of the space with a hard space server-side, rather than client-side, since if you do it client-side you'll have the problem that the content will reflow when you update it, which will cause a perceptible flash/movement.

    Mostly, you'll have this issue within Text nodes, e.g.:

    <p>Voulez-vous coucher avec moi ce soir ? Non, merci.</p>
    

    That's an Element (the p element) with a single Text node inside it.

    It's possible to have it where the space is in one Text node and the punctuation in another:

    <p><span>Voulez-vous coucher avec moi ce soir </span>? Non, merci.</p>
    

    There, "Voulez-vous coucher avec moi ce soir" and the space are in one Text node (inside a span), but the punctuation is in another text node (in the p). I'm going to assume that's rare enough we don't need to worry about it.

    To deal with it within Text nodes is relatively easy:

    fixSpaces(document.body);
    
    function fixSpaces(node) {
        switch (node.nodeType) {
            case 1: // Element
                for (let child = node.firstChild;
                     child;
                     child = child.nextSibling) {
                    fixSpaces(child);
                }
                break;
    
            case 3: // Text node
                node.nodeValue = node.nodeValue.replace(/ ([;?!])/g, "\u00a0$1");
                break;
        }
    }
    

    Live Example:

    // Obviously, you wouldn't have this in a `setTimeout` in your
    // real code, you'd call it directly right away
    // as shown in the answer
    console.log("Before...");
    setTimeout(() => {
        fixSpaces(document.body);
        console.log("After");
    }, 1000);
    
    function fixSpaces(node) {
        switch (node.nodeType) {
            case 1: // Element
                for (let child = node.firstChild;
                     child;
                     child = child.nextSibling) {
                    fixSpaces(child);
                }
                break;
    
            case 3: // Text node
                node.nodeValue = node.nodeValue.replace(/ ([;?!])/g, "\u00a0$1");
                break;
        }
    }
    p {
        font-size: 14px;
        display: inline-block;
        width: 220px;
        border: 1px solid #eee;
    }
    <p>Voulez-vous coucher avec moi ce soir ? Non, merci.</p>

    You'd have this in a script tag at the end of the body, just before the closing </body> element.

    But again: If you can do this server-side, that would be better, to avoid the reflow.


    If we want to try to handle the case where one Text node ends with a space and the next starts with the punctuation, as I showed above:

    <p><span>Voulez-vous coucher avec moi ce soir </span>? Non, merci.</p>
    

    ...one approach is to get an array of all the Text nodes and then look to see if we have one ending with a space and the next starting with punctuation. This is imperfect because it doesn't try to handle the case where the Text node ending with a space is at the end of a block element (e.g., so the punctuation is always going to be on a different line visually), but it may be better than nothing. The only downside it you can end up with block elements with an extra hard space at the end.

    fixSpaces(document.body);
    
    function gatherText(node, array = []) {
        switch (node.nodeType) {
            case 1: // Element
                for (let child = node.firstChild;
                     child;
                     child = child.nextSibling) {
                    array = gatherText(child, array);
                }
                break;
    
            case 3: // Text node
                array.push(node);
                break;
        }
        return array;
    }
    
    function fixSpaces(node) {
        const texts = gatherText(node);
        for (let i = 0, len = texts.length; i < len; ++i) {
            const text = texts[i];
            const str = text.nodeValue = text.nodeValue.replace(/ ([;?|])/g, "\u00A0$1");
            if (i < len - 1 && str[str.length - 1] === " " && /^[;?!]/.test(texts[i + 1].nodeValue)) {
                // This node ends with a space and the next starts with punctuation,
                // replace the space with a hard space
                text.nodeValue = str.substring(0, str.length - 1) + "\u00A0";
            }
        }
    }
    

    console.log("Before...");
    setTimeout(() => {
        fixSpaces(document.body);
        console.log("After");
    }, 1000);
    
    function gatherText(node, array = []) {
        switch (node.nodeType) {
            case 1: // Element
                for (let child = node.firstChild;
                     child;
                     child = child.nextSibling) {
                    array = gatherText(child, array);
                }
                break;
    
            case 3: // Text node
                array.push(node);
                break;
        }
        return array;
    }
    
    function fixSpaces(node) {
        const texts = gatherText(node);
        for (let i = 0, len = texts.length; i < len; ++i) {
            const text = texts[i];
            const str = text.nodeValue = text.nodeValue.replace(/ ([;?|])/g, "\u00A0$1");
            if (i < len - 1 && str[str.length - 1] === " " && /^[;?!]/.test(texts[i + 1].nodeValue)) {
                // This node ends with a space and the next starts with punctuation,
                // replace the space with a hard space
                text.nodeValue = str.substring(0, str.length - 1) + "\u00A0";
            }
        }
    }
    p {
        font-size: 14px;
        display: inline-block;
        width: 220px;
        border: 1px solid #eee;
    }
    <p><span>Voulez-vous coucher avec moi ce soir </span>? Non, merci.</p>


    Mapping your questions to the above:

    How and when triggering it

    Putting it in a script element at the end of body triggers it very early, but after the content is in the DOM and thus ready for us to operate on it.

    How to select the appropriate tags

    We ignore tags and work at the Text node level.

    How to replace innerText only (ie not properties content)

    By working that the Text node level. Attributes aren't Text nodes so we don't process them.