computer-vision deep-learning conv-neural-network ensemble-learning mixture-model

Facenet: Using Ensembles of Face Embedding Sets

The Facenet is a deep learning model for facial recognition. It is trained for extracting features, that is to represent the image by a fixed length vector called embedding. After training, for each given image, we take the output of the second last layer as its feature vector. Thereafter we can do verification (to tell whether two images are of the same person) based on the features and some distance function (e.g. Euclidean distance).

The triplet loss is a loss function that basically says, the distance between feature vectors of the same person should be small, and the distance between different persons should be large.

My question is, is there any way to mix different embedding sets from different Convolutional models? For example train 3 different model (a Resnet model, an Inception, and a VGG) with triplet loss and then mix 3 128-dimensional embedding to build a new meta-embedding for better face verification accuracy. How can mix this embedding sets?

Solution

There is a same question and helpful answer here.

I think there're different ways to do this, for example 1) concatenate the two embeddings and apply PCA after that 2) normalize each embedding and concatenate them together, so that each model will contribute equally to the final results 3) normalize each feature of each embedding to (0,1) say by Gaussian CDFs and concatenate them together, so that each feature contribute equally to the results.