Search code examples
gounicodecase-folding

Golang complex fold grüßen


I'm trying to get case folding to be consistent between three languages (C++, Python and Golang) because I need to be able to check if a string matches the one saved no matter the language.

An example problematic word is the German word "grüßen" which in uppercase is "GRÜSSEN" (Note the 'ß' becomes two characters as 'SS').

Is there some way to do this that I'm missing, or does this bug at the end of unicode's documentation apply to all usages of text conversion in golang? If so, what are my options for case folding other than writing it in cgo?


Solution

  • Advanced (Unicode-enabled) text processing is not part of the Go stdlib,¹ and exists in the form of a host of ("blessed") third-party packages under the golang.org/x/text/ umbrella.

    As Shawn figured out by himself, one can do

    import (
      "golang.org/x/text/cases"
    )
    
    c := cases.Fold()
    c.String("grüßen")
    

    to get "grüssen" back.


    ¹ That's because whatever is shipped in the stdlib is subject to the Go 1 compatibility promise, and at the time Go 1 was shipped certain functionality wasn't available or was incomplete or its APIs were in flux etc, so such bits were kept out of the core to let them mature.