Search code examples
phpdomain-name

Measure the pronounceability of a word?


I'm tinkering with a domain name finder and want to favour those words which are easy to pronounce.

Example: nameoic.com (bad) versus namelet.com (good).

Was thinking something to do with soundex may be appropriate but it doesn't look like I can use them to produce some sort of comparative score.

PHP code for the win.


Solution

  • Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.

    The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.

    <?php
    // Score: 1
    echo pronounceability('namelet') . "\n";
    
    // Score: 0.71428571428571
    echo pronounceability('nameoic') . "\n";
    
    function pronounceability($word) {
        static $vowels = array
            (
            'a',
            'e',
            'i',
            'o',
            'u',
            'y'
            );
    
        static $composites = array
            (
            'mm',
            'll',
            'th',
            'ing'
            );
    
        if (!is_string($word)) return false;
    
        // Remove non letters and put in lowercase
        $word = preg_replace('/[^a-z]/i', '', $word);
        $word = strtolower($word);
    
        // Special case
        if ($word == 'a') return 1;
    
        $len = strlen($word);
    
        // Let's not parse an empty string
        if ($len == 0) return 0;
    
        $score = 0;
        $pos = 0;
    
        while ($pos < $len) {
            // Check if is allowed composites
            foreach ($composites as $comp) {
                $complen = strlen($comp);
    
                if (($pos + $complen) < $len) {
                    $check = substr($word, $pos, $complen);
    
                    if ($check == $comp) {
                        $score += $complen;
                        $pos += $complen;
                        continue 2;
                    }
                }
            }
    
            // Is it a vowel? If so, check if previous wasn't a vowel too.
            if (in_array($word[$pos], $vowels)) {
                if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
                    $score += 1;
                    $pos += 1;
                    continue;
                }
            } else { // Not a vowel, check if next one is, or if is end of word
                if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
                    $score += 2;
                    $pos += 2;
                    continue;
                } elseif (($pos + 1) == $len) {
                    $score += 1;
                    break;
                }
            }
    
            $pos += 1;
        }
    
        return $score / $len;
    }