Search code examples
phpencodingutf-8collationdiacritics

How to detect in PHP that pensé is the same as pense


How can I do a test in PHP to compare two words (one with accent) and detect that is the same word ?

e.g pensé vs pense

Basically I told it to load certain words from MySQL table, and with MySQL collation settings it loads both versions from db, which is fine, but I want PHP to detect them as the same word too (Like MySQL does in this particular situation).

I assume in PHP it can be done with the collator class but I don't understand how it works.


Solution

  • As you mentioned, you can use the PHP Collator class.

    Collator is part of intl extension - make sure your PHP has intl extension enabled

    <?php
    // Check if Intl extension is loaded
    if (!extension_loaded('intl')) {
        exit('Intl extension is not enabled. Please enable it to use Collator.');
    }
    
    // Create a Collator object with a specific locale
    $collator = new Collator('en_US');
    
    // Set the strength to PRIMARY to ignore accents and case differences
    $collator->setStrength(Collator::PRIMARY);
    
    // Strings to compare
    $string1 = 'pensé';
    $string2 = 'pense';
    
    // Compare the strings
    if ($collator->compare($string1, $string2) == 0) {
        echo "The strings are considered equal.";
    } else {
        echo "The strings are not equal.";
    }
    ?>
    

    If you are using multiple different languages:

    1. Set a general use locale: $collator = new Collator('root');

    2. Dynamically switch locale if the language context is known, for example a user has a specific language set.

    This would look something like this:

    <?php
    function getLocaleFromLanguage($languageCode) {
        $locales = [
            'en' => 'en_US',
            'fr' => 'fr_FR',
            'es' => 'es_ES',
            'de' => 'de_DE',
        ];
    
        return $locales[$languageCode] ?? 'en_US';
    }
    
    function compareStrings($string1, $string2, $languageCode) {
        if (!extension_loaded('intl')) {
            exit('Intl extension is not enabled. Please enable it to use Collator.');
        }
    
        $locale = getLocaleFromLanguage($languageCode);
        $collator = new Collator($locale);
        $collator->setStrength(Collator::PRIMARY);
    
        return $collator->compare($string1, $string2) == 0;
    }
    
    // Example usage
    $string1 = 'pensé'; // Assume this is Spanish
    $string2 = 'pense';
    
    if (compareStrings($string1, $string2, 'es')) {
        echo "The strings are considered equal.";
    } else {
        echo "The strings are not equal.";
    }
    ?>