Search code examples
smalltalkdiacriticspharo

How can I remove diacritics (umlauts) from a String?


How can I convert a string, such as Příliš žluťoučký kůň úpěl ďábelské ódy. into Prilis zlutoucky kun upel dabelske ody.?

The source string is in Unicode, so in principle it should be possible to use normalization/decomposition to separate the umlaut.

Unfortunately I didn't see any library in Pharo (maybe Zinc hidden somewhere?) that would support either stripping umlauts or decomposition.


Solution

  • You can try Diacriticals package

    Installation

    Metacello new
        smalltalkhubUser: 'Pharo' project: 'MetaRepoForPharo50';
        configuration: 'Diacritics';
        version: #development;
        load.
    

    Test

    'Příliš žluťoučký kůň úpěl ďábelské ódy' asNonDiacritical.
     "'Prilis zlutoucky kun upel dabelske ody'"