Search code examples
sortinghashutf-8amazon-dynamodbnatural-sort

Natural sorting of UTF-8 strings in DynamoDB


I'm storing file names (with extension) and directory names as UTF-8 strings in DynamoDB as sort keys.

As far as I know, file names + ext and directory names are unique within a directory, so I can use those strings as unique IDs within the parent directory.

These strings will, being UTF-8, be sorted alphabetically. 10 will come before 2, uppercase before lowercase and so on.

As I try to represent a file hierarchy, I would like to retrieve the items sorted in a natural order instead.

I could do some magic on the strings to have them sort naturally before I use them as sort keys, but then I would need to keep an attribute with the original name and those are bytes I would like to save, if possible.

If it matters, this is part of a single table design.

Are there any design patterns, hashing algorithms or other approaches I could use to solve this?


Solution

  • I don't know what "magic" you intend to do. Usually people will zero-pad the numbers to some arbitrary max length so that string sorting the numbers matches the numeric sort, for positive integers anyway. If you do that you could remove the padding on display.