I create HTML output with a Python script.
The <head>
tag need a lang=
attribute.
I wan't to use Python's locale
package for this. But the values from locale
(e.g. de_DE
) are not HTML conform. e.g. de_DE
is not accepted by the W3C validator but de
is.
So I do it that way:
#!/usr/bin/env python3
import locale
locale_lang = locale.getlocale()[0] # e.g. 'de_DE'
html_lang_value = locale_lang.split('_')[0] # e.g. 'de'
head_tag = f'<head lang="{html_lang_value}">'
print(head_tag) # <head lang="de">
The question is if this is a good idea and will work in all languages?
According to locale.get_locale
docs it
Returns the current setting for the given locale category as sequence containing language code, encoding.(...)Except for the code
'C'
, the language code corresponds to RFC 1766. language code and encoding may beNone
if their values cannot be determined.
Note that if language code will be 'C'
then your locale_lang.split('_')[0]
will be 'C'
. Are you allowed to accept such value as lang
attribute value? More importantly langauge code might be None
which will crash your code as you must not .split
None
, so if your code must work reliably then you should be prepared for getting locale_lang
value as None
.