python database sqlite numpy face-recognition

Storing a list of face encodings in python for face identification

I'm working on a project that could potentially have a massive list of face encodings that are used for face identification purposes, i'm using the face_recognition module. Passing it an image returns a face encoding which is essentially an object. My idea currently is to use the pickle module on the list storing the encodings and loading it to the list again in the constructor. I feel like this wouldn't scale up very well and i'm probably better off using some sort of a database to store them.

Does anyone have any ideas for this? And if i were to use a database how would i go about storing objects in that database?

To be more specific the encodings are of type numpy.ndarray

Thank you!

Solution

If you are going to use your program as a general face identification method, storing them in a database as separate records might not be a good idea. Consider this case: You have 100,000 encoded vectors and you want to check if the new photo has any corresponding record in your previously seen faces. As you need to compare the new vector against all of the stored ones, you need to either load all of them upon each request or load them once and cache it in memory to do a vectorised operation over all of them (e.g. getting euclidian distance).

As you can see, none of the database operations like indexing, searching on fields, transactions, etc. is getting used. So, I recommend leaving it with pickle objects on disk for persistence and load them once while the program is invoked. If you are going to add/remove stuff from the storage, I suggest a NoSQL database (like MongoDB) for storing the objects. This allows you to avoid creating not meaningful tables/ dealing with BLOBs etc. that do not provide any benefit in your case. Here is a starter for dealing with mongo (you need to install it before running the code):

from pymongo import MongoClient
import numpy as np

client = MongoClient('localhost', 27018)
db = client['face_db']

faces = db.face

first_person_name = "John"
first_sample_face_embedding = np.random.rand(128).tolist() 

second_person_name = "Julia"
second_sample_face_embedding = np.random.rand(128).tolist()

faces.insert_many([
    {"name": first_person_name, "embedding": first_sample_face_embedding},
    {"name": second_person_name, "embedding": second_sample_face_embedding}
])

#### load data back

all_docs = list(faces.find({}))
names, embeddings = [doc["name"] for doc in all_docs], [doc["embedding"] for doc in all_docs]

embeddings = np.array(embeddings)


target_embedding = np.random.rand(128)

# do stuff here

You can read this post for more info about working with mongo in python.