Search code examples
unicode

Why do some non-ASCII code points don't require special fonts but others do?


I've never needed to use fonts to display characters belonging to popular Indian languages: Hindi/Bengali/Kannada/etc, but I have to for others.

Why is that? How can I determine that these specific characters are not going to be rendered properly on my website, but instead look like boxes with Unicode hexadecimals?

Example:

Basic Latin: A
Arabic: ث
Bengali:
Tagalog:
Currency Symbols:
Ancient Greek Numbers: 𐅢
Tirhuta: 𑒛

I am specifically looking at Tirhuta script. I can see the character when I use a related font.

However, if my app takes a string in other languages from the user, how do I make sure that those characters appear correctly?

Main question: why did I never have to deal with specific fonts for popular Indian scripts?


Solution

  • I've never needed to use fonts to display characters belonging to popular Indian languages: Hindi/Bengali/Kannada/etc, but I have to for others.

    You always need a font to display characters of any kind. What you may mean here is that the built-in fonts in the system you are using happen to cover some scripts and not others. What built-in fonts are available completely depends on the specific system you're talking about. The built-in fonts on a Windows web browser are not the same as the built-in fonts for an iPhone app. Built-in fonts also tend to vary around the world (though this seems to be changing somewhat as markets become ever-more global).

    You hint that you're discussing web browsers. On my system (Safari on macOS), 𑒛 renders fine using the built-in fonts, as do all of your examples. So it depends on which web browser you mean and on which platform.

    However, if my app takes a string in other languages from the user, how do I make sure that those characters appear correctly?

    Generally they will appear correctly to the person entering the text because they will have a font that supports it installed. But to make sure that text displays correctly to everyone is not possible in the general case. There are characters in Unicode that no generally available font draws correctly. There are a lot of characters in Unicode: 149,813 in Unicode 15.1.

    As just one example, the hieroglyphic cartouche is not supported by any well-known font. If someone enters it, the best you're likely to get is 𓐼. Unless you design (or hire someone to design) a font that supports it, there's no solution today, and certainly not with a built-in font.

    Main question: why did I never have to deal with specific fonts for popular Indian scripts?

    The answer is in the question: "popular." The more popular a script, the more likely that many systems will provide a built-in font that covers it.

    But ultimately, if you're working on the web and want many scripts to be readable to people on many systems that may not have installed fonts for all of those scripts, you will likely need to provide additional fonts in the page. There is no general answer here. You will have to pick languages you intend to cover, and choose fonts for them. There is no font that has a meaningful glyph for every character in Unicode.