Clang now (>3.3) supports Unicode characters in variable names: Clang 3.3 Release Notes, Major New Features.
However, some special character are still forbidden.
int main(){
double α = 2.; // Alpha, ok!
double ∞ = 99999.; // Infinity, error
}
giving:
error: non-ASCII characters are not allowed outside of literals and identifiers
double ∞ = 99999.;
What is the fundamental difference between α
(alpha) and ∞
(infinity) for Clang? That the former is Unicode and the latter is not Unicode, but at the same time is not ASCII?
Is there a workaround or an option to allow this set of characters in Clang (or BTW in GCC)?
Notes: 1) ∞
is just an example; there are a lot of characters that are potentially useful, but also forbidden, like ∫
or ∂
. 2) I am not asking if it is good idea, and please take it as a technical question. 3) I am interested in C++ compiler of Clang 3.4 in Linux (GCC 4.8.3 (2014-05-22) doesn't support this). I am saving the source files with gedit using UTF-8 encoding and Unix/Linux line ending. 4) adding other normal first characters doesn't help: _∞
The answers point to a definite NO. Some ranges are indeed not allowed nor will they be soon. To move one step further to total craziness, the best alternative I found was to use characters that effectively look the same. (Now, this I might admit is not a good idea.) Those alternatives can be found here http://shapecatcher.com/. The result (sorry if it hurts your eyes):
//double ∞ = 99999.; // Still an error //double ⧞ = 99999.; // Infinity negated. Still an error double ꝏ = 99999.; // Letter oo double Ꝏ = 99999.; // Letter OO //double ⧜ = 99999.; // Incomplete infinity. Still an error
Other "alternative" dead ringers mentioned in the question that are in the allowed range:
ʃ
,𝜕𝝏𝞉𝟃
.
Note: This question has Unicode text that may not display correctly in all browsers.
So the Clang document says (emphasis mine):
This feature allows identifiers to contain certain Unicode characters, as specified by the active language standard;
This is covered in the draft C++ standard Annex E. The characters allowed are as follows:
E.1 Ranges of characters allowed [charname.allowed]
00A8, 00AA, 00AD,
00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF
0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,
2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF
3004-3007, 3021-302F, 3031-303F
3040-D7FF F900-FD3D, FD40-FDCF,
FDF0-FE44, FE47-FFFD
10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD
The code for infinity 221E
is not included in the list.
For reference: these are the codes above converted to Unicode characters (some of them may not display correctly in all browsers/available fonts).
¨, ª, ,
¯, ²-µ, ·-º, ¼-¾, À-Ö, Ø-ö, ø-ÿ
Ā-ᙿ, ᚁ-᠍, ᠏- -, -, ‿-⁀, ⁔,
- ⁰-, ①-⓿, ❶-➓, Ⰰ-ⷿ, ⺀-
〄-〇, 〡-〯, 〱-〿
- 豈-ﴽ, ﵀-﷏,
ﷰ-﹄, ﹇-�
𐀀-, 𠀀-, 𰀀-, -, -, -, -, -, -, -, -, -, -, -
I could not find an extensive document that covers the rationale for the ranges chosen, although N3146: Recommendations for extended identifier characters for C and C++ does provide some details on the influences.
EDIT 1 (2023):
These are the first 120 characters in each of the groups listed, generated by a program. Honestly, it's not a very exciting character set, with no math symbols or APL symbols; it's close to useless, perhaps on purpose.
¨ª|
¯|
²³´µ|
·¸¹º|
¼½¾|
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ|
ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö|
øùúûüýþÿ|
ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶ...|
ᚁᚂᚃᚄᚅᚆᚇᚈᚉᚊᚋᚌᚍᚎᚏᚐᚑᚒᚓᚔᚕᚖᚗᚘᚙᚚ᚛᚜ᚠᚡᚢᚣᚤᚥᚦᚧᚨᚩᚪᚫᚬᚭᚮᚯᚰᚱᚲᚳᚴᚵᚶᚷᚸᚹᚺᚻᚼᚽᚾᚿᛀᛁᛂᛃᛄᛅᛆᛇᛈᛉᛊᛋᛌᛍᛎᛏᛐᛑᛒᛓᛔᛕᛖᛗᛘᛙᛚᛛᛜᛝᛞᛟᛠᛡᛢᛣᛤᛥᛦᛧᛨᛩᛪ᛫᛬᛭ᛮᛯᛰᛱᛲᛳᛴᛵᛶᛷ...|
᠏᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙ᠠᠡᠢᠣᠤᠥᠦᠧᠨᠩᠪᠫᠬᠭᠮᠯᠰᠱᠲᠳᠴᠵᠶᠷᠸᠹᠺᠻᠼᠽᠾᠿᡀᡁᡂᡃᡄᡅᡆᡇᡈᡉᡊᡋᡌᡍᡎᡏᡐᡑᡒᡓᡔᡕᡖᡗᡘᡙᡚᡛᡜᡝᡞᡟᡠᡡᡢᡣᡤᡥᡦᡧᡨᡩᡪᡫᡬᡭᡮᡯᡰᡱᡲᡳᡴᡵᡶᡷᡸᢀᢁᢂᢃᢄᢅ...|
|
|
‿⁀|
⁔|
|
⁰ⁱ⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₒₓₔₕₖₗₘₙₚₛₜ₠₡₢₣₤₥₦₧₨₩₪₫€₭₮₯₰₱₲₳₴₵₶₷₸₹₺₻₼₽₾₿⃀⃒⃓⃘⃙⃚⃐⃑⃔⃕⃖⃗⃛⃜⃝⃞⃟⃠⃡⃢⃣⃤⃥⃦...|
①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳⑴⑵⑶⑷⑸⑹⑺⑻⑼⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇⒈⒉⒊⒋⒌⒍⒎⒏⒐⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛⒜⒝⒞⒟⒠⒡⒢⒣⒤⒥⒦⒧⒨⒩⒪⒫⒬⒭⒮⒯⒰⒱⒲⒳⒴⒵ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏⓐⓑⓒⓓⓔⓕⓖ...|
❶❷❸❹❺❻❼❽❾❿➀➁➂➃➄➅➆➇➈➉➊➋➌➍➎➏➐➑➒➓|
ⰀⰁⰂⰃⰄⰅⰆⰇⰈⰉⰊⰋⰌⰍⰎⰏⰐⰑⰒⰓⰔⰕⰖⰗⰘⰙⰚⰛⰜⰝⰞⰟⰠⰡⰢⰣⰤⰥⰦⰧⰨⰩⰪⰫⰬⰭⰮⰯⰰⰱⰲⰳⰴⰵⰶⰷⰸⰹⰺⰻⰼⰽⰾⰿⱀⱁⱂⱃⱄⱅⱆⱇⱈⱉⱊⱋⱌⱍⱎⱏⱐⱑⱒⱓⱔⱕⱖⱗⱘⱙⱚⱛⱜⱝⱞⱟⱠⱡⱢⱣⱤⱥⱦⱧⱨⱩⱪⱫⱬⱭⱮⱯⱰⱱⱲⱳⱴⱵⱶ...|
⺀⺁⺂⺃⺄⺅⺆⺇⺈⺉⺊⺋⺌⺍⺎⺏⺐⺑⺒⺓⺔⺕⺖⺗⺘⺙⺛⺜⺝⺞⺟⺠⺡⺢⺣⺤⺥⺦⺧⺨⺩⺪⺫⺬⺭⺮⺯⺰⺱⺲⺳⺴⺵⺶⺷⺸⺹⺺⺻⺼⺽⺾⺿⻀⻁⻂⻃⻄⻅⻆⻇⻈⻉⻊⻋⻌⻍⻎⻏⻐⻑⻒⻓⻔⻕⻖⻗⻘⻙⻚⻛⻜⻝⻞⻟⻠⻡⻢⻣⻤⻥⻦⻧⻨⻩⻪⻫⻬⻭⻮⻯⻰⻱⻲⻳...|
〄々〆〇|
〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬|
〱〲〳〴〵〶〷〸〹〺〻〼〽〾〿|
ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ゙゚゛゜ゝゞゟ゠ァアィイゥウェエォオカガキギクグケゲコゴサザ...|
豈更車賈滑串句龜龜契金喇奈懶癩羅蘿螺裸邏樂洛烙珞落酪駱亂卵欄爛蘭鸞嵐濫藍襤拉臘蠟廊朗浪狼郎來冷勞擄櫓爐盧老蘆虜路露魯鷺碌祿綠菉錄鹿論壟弄籠聾牢磊賂雷壘屢樓淚漏累縷陋勒肋凜凌稜綾菱陵讀拏樂諾丹寧怒率異北磻便復不泌數索參塞省葉說殺辰沈拾若掠略...|
﵀﵁﵂﵃﵄﵅﵆﵇﵈﵉﵊﵋﵌﵍﵎﵏ﵐﵑﵒﵓﵔﵕﵖﵗﵘﵙﵚﵛﵜﵝﵞﵟﵠﵡﵢﵣﵤﵥﵦﵧﵨﵩﵪﵫﵬﵭﵮﵯﵰﵱﵲﵳﵴﵵﵶﵷﵸﵹﵺﵻﵼﵽﵾﵿﶀﶁﶂﶃﶄﶅﶆﶇﶈﶉﶊﶋﶌﶍﶎﶏﶒﶓﶔﶕﶖﶗﶘﶙﶚﶛﶜﶝﶞﶟﶠﶡﶢﶣﶤﶥﶦﶧﶨﶩﶪﶫﶬﶭﶮﶯﶰﶱﶲﶳﶴﶵﶶ...|
ﷰﷱﷲﷳﷴﷵﷶﷷﷸﷹﷺﷻ﷼﷽﷾﷿︀︁︂︃︄︅︆︇︈︉︊︋︌︍︎️︐︑︒︓︔︕︖︗︘︙︧︨︩︪︫︬︭︠︡︢︣︤︥︦︮︯︰︱︲︳︴︵︶︷︸︹︺︻︼︽︾︿﹀﹁﹂﹃﹄|
﹇﹈﹉﹊﹋﹌﹍﹎﹏﹐﹑﹒﹔﹕﹖﹗﹘﹙﹚﹛﹜﹝﹞﹟﹠﹡﹢﹣﹤﹥﹦﹨﹩﹪﹫ﹰﹱﹲﹳﹴﹶﹷﹸﹹﹺﹻﹼﹽﹾﹿﺀﺁﺂﺃﺄﺅﺆﺇﺈﺉﺊﺋﺌﺍﺎﺏﺐﺑﺒﺓﺔﺕﺖﺗﺘﺙﺚﺛﺜﺝﺞﺟﺠﺡﺢﺣﺤﺥﺦﺧﺨﺩﺪﺫﺬﺭﺮﺯﺰﺱﺲﺳﺴﺵﺶﺷﺸﺹﺺﺻﺼﺽ...|
𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇𐀈𐀉𐀊𐀋𐀍𐀎𐀏𐀐𐀑𐀒𐀓𐀔𐀕𐀖𐀗𐀘𐀙𐀚𐀛𐀜𐀝𐀞𐀟𐀠𐀡𐀢𐀣𐀤𐀥𐀦𐀨𐀩𐀪𐀫𐀬𐀭𐀮𐀯𐀰𐀱𐀲𐀳𐀴𐀵𐀶𐀷𐀸𐀹𐀺𐀼𐀽𐀿𐁀𐁁𐁂𐁃𐁄𐁅𐁆𐁇𐁈𐁉𐁊𐁋𐁌𐁍𐁐𐁑𐁒𐁓𐁔𐁕𐁖𐁗𐁘𐁙𐁚𐁛𐁜𐁝...|
𠀀𠀁𠀂𠀃𠀄𠀅𠀆𠀇𠀈𠀉𠀊𠀋𠀌𠀍𠀎𠀏𠀐𠀑𠀒𠀓𠀔𠀕𠀖𠀗𠀘𠀙𠀚𠀛𠀜𠀝𠀞𠀟𠀠𠀡𠀢𠀣𠀤𠀥𠀦𠀧𠀨𠀩𠀪𠀫𠀬𠀭𠀮𠀯𠀰𠀱𠀲𠀳𠀴𠀵𠀶𠀷𠀸𠀹𠀺𠀻𠀼𠀽𠀾𠀿𠁀𠁁𠁂𠁃𠁄𠁅𠁆𠁇𠁈𠁉𠁊𠁋𠁌𠁍𠁎𠁏𠁐𠁑𠁒𠁓𠁔𠁕𠁖𠁗𠁘𠁙𠁚𠁛𠁜𠁝𠁞𠁟𠁠𠁡𠁢𠁣𠁤𠁥𠁦𠁧𠁨𠁩𠁪𠁫𠁬𠁭𠁮𠁯𠁰𠁱𠁲𠁳𠁴𠁵𠁶...|
𰀀𰀁𰀂𰀃𰀄𰀅𰀆𰀇𰀈𰀉𰀊𰀋𰀌𰀍𰀎𰀏𰀐𰀑𰀒𰀓𰀔𰀕𰀖𰀗𰀘𰀙𰀚𰀛𰀜𰀝𰀞𰀟𰀠𰀡𰀢𰀣𰀤𰀥𰀦𰀧𰀨𰀩𰀪𰀫𰀬𰀭𰀮𰀯𰀰𰀱𰀲𰀳𰀴𰀵𰀶𰀷𰀸𰀹𰀺𰀻𰀼𰀽𰀾𰀿𰁀𰁁𰁂𰁃𰁄𰁅𰁆𰁇𰁈𰁉𰁊𰁋𰁌𰁍𰁎𰁏𰁐𰁑𰁒𰁓𰁔𰁕𰁖𰁗𰁘𰁙𰁚𰁛𰁜𰁝𰁞𰁟𰁠𰁡𰁢𰁣𰁤𰁥𰁦𰁧𰁨𰁩𰁪𰁫𰁬𰁭𰁮𰁯𰁰𰁱𰁲𰁳𰁴𰁵𰁶...|
...|
...|
...|
...|
...|
...|
...|
...|
...|
...|
...|
NOTE 2:
I tried some of them online and some worked in clang 12 (for example) but not in https://godbolt.org/z/G1ET49erx. Apparently some version (clang 14?) implemented this https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1949r7.html
So, it seems to be a moving target.
NOTE 3:
Later versions of clang seem to accept some math symbols, related to the warning -Wmathematical-notation-identifier-extension
.
https://godbolt.org/z/6K7YhzEnz
So, it still seems to be the Wild West.