Search code examples
pythonsqlitedatepeewee

How to extract correct data from Sqlite database using Python?


I have a database of people names and their birthdays. The format of birthday is mm/dd/yyyy, like "3/13/1960".

I want to extract a list of people who are born after a specific date. I called this date "base".

The program that you see below, firstly creates a DB of people (to simulate the real DB that I want to work with), and then extracts the required list. The problem is that the result is not as I expect:

import datetime as dt
import peewee as pw
db = pw.SqliteDatabase('people1.db')

class Person(pw.Model):
    name = pw.CharField()
    birthday = pw.DateField(formats=['%m/%d/%Y'])
    class Meta:
        database = db # This model uses the "people.db" database.

db.create_tables([Person])


bob0 = Person(name='Bob0', birthday='4/13/1940')
bob1 = Person(name='Bob1', birthday='5/13/1950')
bob2 = Person(name='Bob2', birthday='3/13/1960')
bob3 = Person(name='Bob3', birthday='3/13/1970')
bob4 = Person(name='Bob4', birthday='3/13/1980')
bob5 = Person(name='Bob5', birthday='3/13/1990')

base = Person(name="base", birthday='3/13/1960')

bob0.save()
bob1.save()
bob2.save()
bob3.save()
bob4.save()
bob5.save()
base.save()

for item in Person.select().where(Person.birthday > base.birthday):
    print item.name , item.birthday

Output:

>>> ================================ RESTART ================================
>>> 
Bob0 1940-04-13
Bob1 1950-05-13
Bob3 1970-03-13
Bob4 1980-03-13
Bob5 1990-03-13
>>> 

As you see above, base = 3/13/1960. So I shouldn't have Bob0 and Bob1 in output! How can I handle it?

Note that, I don't want to change the format of birthdays in the database. I also don't want to fetch all the rows and check them later! I want to fetch the required rows only.


Solution

  • You can use sqlite3.Connection.create_function to specify your own sqlite function that will transform your dates into something that can be lexicographically sorted:

    import datetime as dt
    import peewee as pw
    
    # custom sqlite function to reformat our date string
    def _sqlite_reformat_date(unfortunate_date_string):
        return dt.datetime \
            .strptime(unfortunate_date_string,'%m/%d/%Y') \
            .strftime('%Y-%m-%d')
    
    # Subclass pw.SqliteDatabase to add our custom sqlite function
    class MySqliteDatabase(pw.SqliteDatabase):
        def __init__(self, *args, **kwargs):
            super(MySqliteDatabase, self).__init__(*args, **kwargs)
    
        def _add_conn_hooks(self, conn):
            conn.create_function('reformat_date', 1, _sqlite_reformat_date)
            super(MySqliteDatabase, self)._add_conn_hooks(conn)
    
    db = MySqliteDatabase('people1.db')
    
    # ...
    # Your model definition and data inserts from your example above
    # ...
    
    rd = pw.fn.reformat_date # Use our custom sqlite function
    for item in Person.select().where(rd(Person.birthday) > rd(base.birthday)):
        print item.name , item.birthday
    

    Although this approach will 'fetch the required rows only' it will still run this python function for every row! It is little better than just fetching all the rows in doing the date comparison in python, it may even be slower!

    However the _sqlite_reformat_date function could be refactored to be much faster and its nice to know how easy it is to add a custom function into sqlite.