Search code examples
htmlcssbrowserunicodeline-breaks

What are breakable non-whitespace characters


I found out that browsers (I tested only Chrome behaviour) breaks line on some characters in words to prevent text overflow (in standard behaviour, thus: word-wrap: normal). I don't think about breakable whitespace, but about these concrete Unicode characters:

  • hyphen(-)
  • soft hyphen (­)
  • dash (–)
  • long dash (—)
  • full-width plus/minus (+/-)
  • minus (−)

So the questions...

  • Is there any other character with this property?
  • Why exactly these characters and not any other punctation characters (like dot, comma or slash)?
  • I know why hyphens and dash, but why plus and minus and not for example multiply (×)??
  • Is this behaviour consistent across browsers? Standardized in HTML/CSS or Unicode?

Have a try:

<div style="width: 50px">
veryvery-veryvery-veryvery-veryvery
veryvery–veryvery–veryvery–veryvery
veryvery—veryvery—veryvery—veryvery
veryvery­veryvery­veryvery­veryvery
veryvery−veryvery−veryvery−veryvery
veryvery+veryvery+veryvery+veryvery
long
</div>

Solution

  • Breaks in HTML/CSS text generally occur at "soft wrap opportunities", but the specific behaviour around which characters present such an opportunity is not standardised. Rather, the CSS specification defers to other text formatting specifications (e.g. language-specific guidelines).

    However, a popular generic implementation is the Unicode Line Breaking Algorithm. The algorithm examines the Unicode properties of neighbouring characters with a set of rules to either create, force, or inhibit break points. It is not possible to come up with a complete list of individual characters that can create a break because the context that the character appears in is a relevant factor.