Search code examples
mysqlsqlsortingsql-order-by

SQL ORDER BY a string value: What is it comparing? (Case sensitive?)


I would like to know what exactly SQL is comparing when we use the ORDER BY statement. More specifically, I'm interested in the comparison when it compares string. Supposedly it is sorting it alphabetically, but what is it actually comparing?

My hunch tells me it could be comparing ASCII values of the characters starting from the left, which would also imply that the sorting is case sensitive ('Btest' would be smaller than 'atest'), but I am unable to find a source confirming this.


Solution

  • In MySQL, it depends on the effective collation. Collation is the set of rules that determine the position of characters in an ordered set and what characters are considered equal, and typically involve natural language rules. For example, Spanish used to have ch as an independent letter located betwen c and d and then switched to being just individual c and h; MySQL has collations for both.

    You can see available collations with these commands:

    SHOW COLLATION; -- Display all
    SHOW COLLATION WHERE charset = 'utf8mb4'; -- Filter by encoding
    
    Collation Charset Id Default Compiled Sortlen Pad_attribute
    utf8mb4_0900_ai_ci utf8mb4 255 Yes Yes 0 NO PAD
    utf8mb4_0900_as_ci utf8mb4 305 Yes 0 NO PAD
    utf8mb4_0900_as_cs utf8mb4 278 Yes 0 NO PAD
    utf8mb4_0900_bin utf8mb4 309 Yes 1 NO PAD
    utf8mb4_bin utf8mb4 46 Yes 1 PAD SPACE
    utf8mb4_croatian_ci utf8mb4 245 Yes 8 PAD SPACE
    utf8mb4_cs_0900_ai_ci utf8mb4 266 Yes 0 NO PAD
    utf8mb4_cs_0900_as_cs utf8mb4 289 Yes 0 NO PAD
    utf8mb4_czech_ci utf8mb4 234 Yes 8 PAD SPACE
    utf8mb4_danish_ci utf8mb4 235 Yes 8 PAD SPACE

    [...]

    Collation names in MySQL use some common substrings to indicate certain features:

    • ci / cs for Case Insensitive / Case Sensitive
    • ai / as for Accent Sensitive / Accent Insensitive

    ... and some others (full list here).

    In MySQL, you can set collation at several levels:

    • Server
    • Database
    • Table
    • Column
    • Connection
    • Individual strings in SQL

    So you always get one, either explicit or implicit.