I wrote a class object to access mathematical alphanumeric symbols from the unicode block as described on https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
# San-serif
LATIN_SANSERIF_NORMAL_UPPER = (120224, 120250)
LATIN_SANSERIF_NORMAL_LOWER = (120250, 120276)
LATIN_SANSERIF_BOLD_UPPER = (120276, 120302)
LATIN_SANSERIF_BOLD_LOWER = (120302, 120328)
LATIN_SANSERIF_ITALIC_UPPER = (120328, 120354)
LATIN_SANSERIF_ITALIC_LOWER = (120354, 120380)
LATIN_SANSERIF_BOLDITALIC_UPPER = (120380, 120406)
LATIN_SANSERIF_BOLDITALIC_LOWER = (120406, 120432)
class MathAlphanumeric:
def __init__(self, script, font, style, case):
self.script = script
self.font = font
self.style = style
self.case = case
def charset(self):
start, end = eval('_'.join([self.script, self.font, self.style, self.case]).upper())
for c in range(start, end):
yield chr(c)
@staticmethod
def supported_scripts():
return {'latin', 'greek', 'digits'}
@staticmethod
def supported_fonts():
return {'serif', 'sanserif', 'calligraphy', 'fraktor', 'monospace', 'doublestruck'}
@staticmethod
def supported_style():
return {'normal', 'bold', 'italic', 'bold-italic'}
@staticmethod
def supported_case():
return {'upper', 'lower'}
And to use it, I'll do:
ma = MathAlphanumeric('latin', 'sanserif', 'bold', 'lower')
print(list(ma.charset()))
[out]:
['𝗮', '𝗯', '𝗰', '𝗱', '𝗲', '𝗳', '𝗴', '𝗵', '𝗶', '𝗷', '𝗸', '𝗹', '𝗺', '𝗻', '𝗼', '𝗽', '𝗾', '𝗿', '𝘀', '𝘁', '𝘂', '𝘃', '𝘄', '𝘅', '𝘆', '𝘇']
The code works as expected but to cover all the mathematical alphanum symbols, I'll have to to enumerate through all the start and end symbols from the script * fonts * style * case
no. of constants.
My questions are:
MathAlphanumeric
object?script * fonts * style * case
no. of constants, in order for MathAlphanumeric.charset()
to work as expected?You may be interested in the unicodedata
standard library, scpecifically :
unicodedata.lookup
:
Look up character by name. If a character with the given name is found, return the corresponding character. If not found,
KeyError
is raised.
unicodedata.name
:
Returns the name assigned to the character chr as a string.
A quick example :
>>> import unicodedata
>>> unicodedata.name(chr(0x1d5a0))
'MATHEMATICAL SANS-SERIF CAPITAL A'
>>> unicodedata.lookup("MATHEMATICAL SANS-SERIF CAPITAL A")
'𝖠'
>>> unicodedata.name(chr(0x1d504))
'MATHEMATICAL FRAKTUR CAPITAL A'
>>> unicodedata.lookup("MATHEMATICAL FRAKTUR CAPITAL A")
'𝔄'
Now you have to find all the names that unicodedata
expects for your use cases, construct the corresponding string from them, and call lookup
.
Here is a mini proof-of-concept :
import unicodedata
import string
def charset(script: str, font: str, style: str, case: str):
features = ["MATHEMATICAL"]
# TODO: use script
assert font in MathAlphanumeric.supported_fonts(), f"invalid font {font!r}"
features.append(font.upper())
assert style in MathAlphanumeric.supported_style(), f"invalid style {style!r}"
if style != "normal":
if font == "fraktur":
features.insert(-1, style.upper()) # "bold" must be before "fraktur"
elif font in ("monospace", "double-struck"):
pass # it has only one style, and it is implicit
else:
features.append(style.upper())
assert case in MathAlphanumeric.supported_case(), f"invalid case {case!r}"
features.append("CAPITAL" if case == "upper" else "SMALL")
return tuple(unicodedata.lookup(" ".join(features + [letter]), ) for letter in string.ascii_uppercase)
if __name__ == '__main__':
print("".join(charset("latin", "sans-serif", "bold", "lower")))
# 𝗮𝗯𝗰𝗱𝗲𝗳𝗴𝗵𝗶𝗷𝗸𝗹𝗺𝗻𝗼𝗽𝗾𝗿𝘀𝘁𝘂𝘃𝘄𝘅𝘆𝘇
print("".join(charset("latin", "fraktur", "bold", "upper")))
# 𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅
print("".join(charset("latin", "monospace", "bold", "upper")))
# 𝙰𝙱𝙲𝙳𝙴𝙵𝙶𝙷𝙸𝙹𝙺𝙻𝙼𝙽𝙾𝙿𝚀𝚁𝚂𝚃𝚄𝚅𝚆𝚇𝚈𝚉
print("".join(charset("latin", "double-struck", "bold", "upper")))
# KeyError: "undefined character name 'MATHEMATICAL DOUBLE-STRUCK CAPITAL C'"
(and I changed a bit your supported_fonts
method : return {'serif', 'sans-serif', 'calligraphy', 'fraktur', 'monospace', 'double-struck'}
)
But there are a lot of caveats in Unicode : it holds all the glyphs you could possibly want, but not organized in a coherent way (due to historical reasons). The failure in my example is caused by :
>>> unicodedata.name("𝔅") # the letter copied from the Wikipedia page
'MATHEMATICAL FRAKTUR CAPITAL B'
>>> unicodedata.name("ℭ") # same, but for C
'BLACK-LETTER CAPITAL C'
So you will need a lot of special cases.
Also :
eval
is considered a bad practice (cf this question), if you can avoid it you should.0x
suffices to tell Python it is an hexadecimal value, but apart from looking "strange" it works exactly the same : 0x1d5a0 == 120224
is True.__init__
is considered a smell, you can just make it a function, simpler and cleaner. If what you want is a namespace you could use Python modules instead.staticmethod
s.