Search code examples
sqlitesql-order-by

Sort order and pipe symbol


Running an SQLite select query where I use order by tagTitle COLLATE NOCASE ASC to sort :

$ echo 'select tagTitle from tags where bookName = "Dragons" order by tagTitle COLLATE NOCASE ASC ' | sqlite3 /tmp/IndexerSqlite.db |awk {'print substr($1,0,1)'} |uniq -c
      1 &
      3 -
      2 .
      2 /
      1 1
      1 ;
      1 @
      1 a
      5 A
      2 a
...
      3 W
      4 X
      1 x
      1 X
      1 Z
      4 | <<--- This!

| is at the bottom. Everywhere else | gets higher precedence:

$ echo "a
b
|
c
$
."|sort 
.
|    // <<-- this!
$
a
b
c

How can I get results in regular order where | is not at the end?

Repro:

$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>  create table test("name" blob);
sqlite>  insert into test (name) values (".csdcdsc");
sqlite>  insert into test (name) values ("XACDSC");
sqlite>  insert into test (name) values ("ACDSC");
sqlite>  insert into test (name) values ("CDSC");
sqlite>  insert into test (name) values ("|CDSC");
sqlite> select * from test order by name  COLLATE NOCASE ASC;
.csdcdsc
ACDSC
CDSC
XACDSC
|CDSC << --This
sqlite> 

Solution

  • Sqlite sorts strings in lexicographic order by codepoint. Pipe is U+007C, putting it after English letters and a bunch of other symbols. sort(1) sorts based on the current locale's rules; you probably are using one that does something fancy with punctuation and symbols. Switch to one that uses plain lexicographic order too, and...

    echo "a
    b
    |
    c
    $
    ." | LC_ALL=C sort 
    $
    .
    a
    b
    c
    |
    

    You'd have to write and load a Sqlite extension module that defines a new collation order. How easy that will be depends on how well you know C.