Search code examples
pythonmysqlsqlormpeewee

Python with peewee to MySQL, save / update doubles existing rows


I am trying to index ~150'000 jpg and cr2 photo files from my harddisk, writing them into a MySQL DB, indexing them and ultimately creating some intranet page to view them.

I would like to write the indexer in Python. I took a look at MySQLdb but decided to go with a light-weight ORM instead (PeeWee).

Being a beginner at Python (coming from PHP), so far, I have been able to write the code below. It does index the files nicely and puts them into a table in MySQL.

The issue I have though is that if I run the code again, it just adds / doubles the lines in the table. I understood that PeeWee is supposed to be 'smart' in its "save"-algorithm and is supposed to automatically update the existing lines instead of just adding them again, but I cannot get this 'smartness' to function.

(Maybe I misunderstood this): http://peewee.readthedocs.org/en/latest/peewee/models.html

When you call save(), peewee determines whether to do an INSERT versus an UPDATE based on the presence of a primary key value.

==> I would appreciate some support / hints into what I could do to make the code update instead of just adding lines (UPDATE vs. INSERT)

==> another question: Will PeeWee be able to handle a big object with ~150k instances?


The SQL table:

 +---------------------------------+
 |          table 'foto            | 
 +------------------+--------------+
 | column name      | type(len)    |         
 +------------------+--------------+
 | fid              | int(11)      |
 | file_name        | varchar(255) |
 | file_path        | text         |
 | file_fullpath    | text         |
 | date_changed     | datetime     |
 | size             | int(11)      |
 | file_extension   | varchar(5)   |
 | file_type        | varchar(255) |
 | file_mime        | varchar(255) |
 | last_indexed     | timestamp    |
 +------------------+--------------+

The Python code:

<!-- language: python -->
    #!/usr/bin/python

    indexpath = './folder1/'

    import os
    from os.path import join, getsize
    import time, datetime
    import peewee
    from peewee import *
    import magic


    db = MySQLDatabase('fotolib', user="asdf", passwd="pass123")

    class Foto(peewee.Model):
        fid = PrimaryKeyField()
        file_name = CharField()
        file_path = TextField()
        file_fullpath = TextField()
        date_changed = DateTimeField()
        size = IntegerField()
        file_extension = CharField()
        file_type = CharField()
        file_mime = CharField()
        last_indexed = DateTimeField(default=datetime.datetime.now)

        class Meta:
            database = db

    for root, dirs, files in os.walk(indexpath):
        for name in files:
            foto = Foto(file_name=name,
                file_path=os.path.join(root, name),
                file_fullpath=os.path.abspath(os.path.join(root, name)),
                date_changed=datetime.datetime.strptime(time.ctime(os.path.getmtime(join(root, name))),
                                                        "%a %b %d %H:%M:%S %Y"),
                size=getsize(join(root, name)),
                file_extension=os.path.splitext(name)[1],
                file_type=magic.from_file(join(root, name)),
                file_mime=magic.from_file(join(root, name), mime=True)
            )
            foto.save()

Solution

  • Not yet elegant, I believe - but I think I found an answer: If I check first in the database if the fullpath&file is already there and hand over the primary key, PeeWee seems to be able to do an UPDATE instead of an INSERT. Still bugs me that I have to first query the DB again and then have to put it into such an unelegant IF / ELSE statement, doubling my lines of code.

    for root, dirs, files in os.walk(indexpath):
        for name in files:
            foto_exist_check = Foto.select(Foto.fid).where(Foto.file_fullpath == os.path.abspath(os.path.join(root, name)))
            if foto_exist_check.count() == 0:
                foto = Foto(
                    file_name=name,
                    file_path=os.path.join(root, name),
                    file_fullpath=os.path.abspath(os.path.join(root, name)),
                    date_changed=datetime.datetime.strptime(time.ctime(os.path.getmtime(join(root, name))),"%a %b %d %H:%M:%S %Y"),
                    size=getsize(join(root, name)),
                    file_extension=os.path.splitext(name)[1],
                    file_type=magic.from_file(join(root, name)),
                    file_mime=magic.from_file(join(root, name), mime=True)
                )
                foto.save()
            else:
                foto = Foto(
                    fid = foto_exist_check.get().fid,
                    file_name=name,
                    file_path=os.path.join(root, name),
                    file_fullpath=os.path.abspath(os.path.join(root, name)),
                    date_changed=datetime.datetime.strptime(time.ctime(os.path.getmtime(join(root, name))),"%a %b %d %H:%M:%S %Y"),
                    size=getsize(join(root, name)),
                    file_extension=os.path.splitext(name)[1],
                    file_type=magic.from_file(join(root, name)),
                    file_mime=magic.from_file(join(root, name), mime=True)
                )
                foto.save()