Search code examples
pythondjangopostgresqllocale

Sort list of tuples considering locale (swedish ordering)


Apparently PostgreSQL 8.4 and Ubuntu 10.04 cannot handle the updated way to sort W and V for Swedish alphabet. That is, it's still ordering them as the same letter like this (old definition for Swedish ordering):

  • Wa
  • Vb
  • Wc
  • Vd

it should be (new definition for Swedish ordering):

  • Vb
  • Vd
  • Wa
  • Wc

I need to order this correctly for a Python/Django website I'm building. I have tried various ways to just order a list of tuples created from a Django QuerySet using *values_list*. But since it's Swedish also å, ä and ö letters needs to be correctly ordered. Now I have either one or the other way both not both..

list_of_tuples = [(u'Wa', 1), (u'Vb',2), (u'Wc',3), (u'Vd',4), (u'Öa',5), (u'äa',6), (u'Åa',7)]

print '########## Ordering One ##############'
ordered_list_one = sorted(list_of_tuples, key=lambda t: tuple(t[0].lower()))
for item in ordered_list_one:
    print item[0]

print '########## Ordering Two ##############'
locale.setlocale(locale.LC_ALL, "sv_SE.utf8")
list_of_names = [u'Wa', u'Vb', u'Wc', u'Vd', u'Öa', u'äa', u'Åa']
ordered_list_two = sorted(list_of_names, cmp=locale.strcoll)
for item in ordered_list_two:
    print item

The examples gives:

########## Ordering One ##############
Vb
Vd
Wa
Wc
äa
Åa
Öa
########## Ordering Two ##############
Wa
Vb
Wc
Vd
Åa
äa
Öa

Now, What I want is a combination of these so that both V/W and å,ä,ö ordering are correct. To be more precise. I want Ordering One to respect locale. By then using the second item (object id) in each tuple I could fetch the correct object in Django.

I'm starting to doubt that this will be possible? Would upgrading PostgreSQL to a newer version that handles collations better and then use raw SQL in Django be possible?


Solution

  • When running LC_ALL=sv_SE.UTF-8 sort on your example on Ubuntu-10.04, it comes out with Wa before Vb (the "old way"), so Ubuntu does not seem to agree with the "new way". Since PostgreSQL relies on the operating system for this, it will behave just the same as the OS given the same lc_collate.

    There is actually a patch in debian glibc related to this particular sort issue: http://sourceware.org/bugzilla/show_bug.cgi?id=9724 But it was objected to and not accepted. If you only need this behavior on a system you administer, you can still apply the change of the patch to /usr/share/i18n/locales/sv_SE and rebuild the se_SV locale by running locale-gen sv_SE.UTF-8. Or better yet, create your own alternative locale derived from it to avoid messing with the original.