Search code examples
phpsortingunicodenatural-sort

UCA + Natural Sorting


I recently learnt that PHP already supports the Unicode Collation Algorithm via the intl extension:

$array = array
(
    'al', 'be',
    'Alpha', 'Beta',
    'Álpha', 'Àlpha', 'Älpha',
    'かたかな',
    'img10.png', 'img12.png',
    'img1.png', 'img2.png',
);

if (extension_loaded('intl') === true)
{
    collator_asort(collator_create('root'), $array);
}

Array
(
    [0] => al
    [2] => Alpha
    [4] => Álpha
    [5] => Àlpha
    [6] => Älpha
    [1] => be
    [3] => Beta
    [11] => img1.png
    [9] => img10.png
    [8] => img12.png
    [10] => img2.png
    [7] => かたかな
)

As you can see this seems to work perfectly, even with mixed case strings! The only drawback I've encountered so far is that there is no support for natural sorting and I'm wondering what would be the best way to work around that, so that I can merge the best of the two worlds.

I've tried to specify the Collator::SORT_NUMERIC sort flag but the result is way messier:

collator_asort(collator_create('root'), $array, Collator::SORT_NUMERIC);

Array
(
    [8] => img12.png
    [7] => かたかな
    [9] => img10.png
    [10] => img2.png
    [11] => img1.png
    [6] => Älpha
    [5] => Àlpha
    [1] => be
    [2] => Alpha
    [3] => Beta
    [4] => Álpha
    [0] => al
)

However, if I run the same test with only the img*.png values I get the ideal output:

Array
(
    [3] => img1.png
    [2] => img2.png
    [1] => img10.png
    [0] => img12.png
)

Can anyone think of a way to preserve the Unicode sorting while adding natural sorting capabilities?


Solution

  • After digging a little more in the documentation I've found the solution:

    if (extension_loaded('intl') === true)
    {
        if (is_object($collator = collator_create('root')) === true)
        {
            $collator->setAttribute(Collator::NUMERIC_COLLATION, Collator::ON);
            $collator->asort($array);
        }
    }
    

    Output:

    Array
    (
        [0] => al
        [3] => Alpha
        [5] => Álpha
        [6] => Àlpha
        [7] => Älpha
        [1] => be
        [4] => Beta
        [10] => img1.png
        [11] => img2.png
        [8] => img10.png
        [9] => img12.png
        [2] => かたかな
    )