Search code examples
sqlsqlitesqlite3-ruby

Is there are reason I select by a string columns which contains SHA1 hexdigests?


I have an invitations table which looks like this

sqlite> .schema invitations
CREATE TABLE "invitations" 
    ( "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
    , "sender_id" integer
    , "recipient_email" varchar(255)
    , "token" varchar(255)
    , "sent_at" datetime
    , "team_id" integer
    , "created_at" datetime
    , "updated_at" datetime
    );

CREATE UNIQUE INDEX "index_invitations_on_recipient_email_and_team_id"
    ON "invitations" ("recipient_email", "team_id");

CREATE INDEX "index_invitations_on_sender_id"
    ON "invitations" ("sender_id");

CREATE INDEX "index_invitations_on_team_id"
    ON "invitations" ("team_id");

The token column stores hexdigests which are generated on record create like so (Ruby):

self.token = Digest::SHA1.hexdigest([Time.now, rand].join)

When I insert an invitation into the database, I can retrieve it with

SELECT * FROM "invitations" where "invitations"."recipient_email" = "an email"

but

SELECT * FROM "invitations" where "invitations"."token" = "an token"

returns noting even though I'm copy/pasting the exact token from the insert statement?

Edit
It turns out that

SELECT * FROM "invitations" where "invitations"."token" LIKE "an token"

will retrive the record correctly.

Why would "LIKE" work, but "=" not? I've tried stripping the hex before insert and doing a case insensitive select. Neither worked.

Edit 2 It seems I'm able to replicate this issue using only the rubygem "sqlite3" and the command line. That is without Rails etc.

Here's the process:

stuff $ gem install sqlite3
Fetching: sqlite3-1.3.3.gem (100%)
Building native extensions.  This could take a while...
Successfully installed sqlite3-1.3.3
1 gem installed
Installing ri documentation for sqlite3-1.3.3...
Installing RDoc documentation for sqlite3-1.3.3...
stuff $ irb
ruby-1.9.2-head :001 > require "sqlite3"
ruby-1.9.2-head :017 > rows = db.execute <<-SQL
ruby-1.9.2-head :018"> create table invitations (
ruby-1.9.2-head :019"> token varchar(40)
ruby-1.9.2-head :020"> );
ruby-1.9.2-head :021"> SQL
# with normal strings for comparison
ruby-1.9.2-head :022 > ['4535435', 'jfeu833'].each {|hash| db.execute "insert into 'invitations' ('token') values (?)", hash }
 => ["4535435", "jfeu833"] 
ruby-1.9.2-head :023 > db.execute("select * from invitations where invitations.token = '4535435'") {|row| p row }
# it finds the row successfully
["4535435"]
 => #<SQLite3::Statement:0x000001011741c8> 
ruby-1.9.2-head :028 > require "digest/sha1"
 => true 
# now to try it with a hash
ruby-1.9.2-head :029 > [Digest::SHA1.hexdigest("banana")].each {|hash| db.execute "insert into 'invitations' ('token') values (?)", hash }
 => ["250e77f12a5ab6972a0895d290c4792f0a326ea8"]
ruby-1.9.2-head :031 > db.execute("select * from invitations where invitations.token = '250e77f12a5ab6972a0895d290c4792f0a326ea8'") {|row| p row }
# notice that no record is printed
 => #<SQLite3::Statement:0x0000010107c630> 

Solution

  • I discussed this with the duckyfuzz (the OP) in a chat, and we found that the hash is stored as a BLOB in sqlite:

    sqlite> select typeof(token) from invitations; 
    blob 
    blob
    

    So for some reason, even though ruby says that what is being inserted is a string:

    irb(main):002:0> (Digest::SHA1.hexdigest("banana")).class() 
    => String
    

    it ends up in sqlite as a BLOB.

    Interpolating the value, or inserting as literal, instead of inserting using parameters makes the problem go away (as tested by the OP):

    oh ok got it
    ruby-1.9.2-head :010 > db.execute("insert into invitations (token) VALUES ('#{the_hash}')") 
    => []
    ok now the dump..
    
    INSERT INTO "invitations" VALUES('bda04628ea94f26cac0793eac103258eb515c505');
    much better!
    

    The issue is caused by the fact that binary strings will be stored as blobs by the sqlite3 rubygem. The way to prevent this is to encode the hash as UTF-8 before insert.

    hash = Digest::SHA1.hexdigest("banana").encode("UTF-8")
    db.execute("insert into invitations (token) values (?)", hash)
    

    Once you do that, the hash will be stored as text.