Search code examples
pythonhtmllang

Extract html conform language value from pythons locale package


I create HTML output with a Python script. The <head> tag need a lang= attribute.

I wan't to use Python's locale package for this. But the values from locale (e.g. de_DE) are not HTML conform. e.g. de_DE is not accepted by the W3C validator but de is.

So I do it that way:

#!/usr/bin/env python3
import locale

locale_lang = locale.getlocale()[0]  # e.g. 'de_DE'

html_lang_value = locale_lang.split('_')[0]  # e.g. 'de'

head_tag = f'<head lang="{html_lang_value}">'

print(head_tag)  # <head lang="de">

The question is if this is a good idea and will work in all languages?


Solution

  • According to locale.get_locale docs it

    Returns the current setting for the given locale category as sequence containing language code, encoding.(...)Except for the code 'C', the language code corresponds to RFC 1766. language code and encoding may be None if their values cannot be determined.

    Note that if language code will be 'C' then your locale_lang.split('_')[0] will be 'C'. Are you allowed to accept such value as lang attribute value? More importantly langauge code might be None which will crash your code as you must not .split None, so if your code must work reliably then you should be prepared for getting locale_lang value as None.