Using tensorflow to identify lego bricks?

having read this article about a guy who uses tensorflow to sort cucumber into nine different classes I was wondering if this type of process could be applied to a large number of classes. My idea would be to use it to identify Lego parts.

At the moment, a site like Bricklink describes more than 40,000 different parts so it would be a bit different than the cucumber example but I am wondering if it sounds suitable. There is no easy way to get hundreds of pictures for each part but does the following process sound feasible :

take pictures of a part ;
try to identify the part using tensorflow ;
if it does not identify the correct part, take more pictures and feed the neural network with them ;
go on with the next part.

That way, each time we encounter a new piece we "teach" the network its reference so that it can better be recognized the next time. Like that and after hundreds of iterations monitored by a human, could we imagine tensorflow to be able to recognize the parts? At least the most common ones?

My question might sound stupid but I am not into neural networks so any advice is welcome. At the moment I have not found any way to identify a lego part based on pictures and this "cucumber example" sounds promising so I am looking for some feedback.

Thanks.

Solution

You can read about the work of Jacques Mattheij, he actually uses a customized version of Xception¹ running on https://keras.io/.

The introduction is Sorting 2 Metric Tons of Lego.

In Sorting 2 Tons of Lego, The software Side you can read:

The hard challenge to deal with next was to get a training set large enough to make working with 1000+ classes possible. At first this seemed like an insurmountable problem. I could not figure out how to make enough images and to label them by hand in acceptable time, even the most optimistic calculations had me working for 6 months or longer full-time in order to make a data set that would allow the machine to work with many classes of parts rather than just a couple.

In the end the solution was staring me in the face for at least a week before I finally clued in: it doesn’t matter. All that matters is that the machine labels its own images most of the time and then all I need to do is correct its mistakes. As it gets better there will be fewer mistakes. This very rapidly expanded the number of training images. The first day I managed to hand-label about 500 parts. The next day the machine added 2000 more, with about half of those labeled wrong. The resulting 2500 parts where the basis for the next round of training 3 days later, which resulted in 4000 more parts, 90% of which were labeled right! So I only had to correct some 400 parts, rinse, repeat… So, by the end of two weeks there was a dataset of 20K images, all labeled correctly.

This is far from enough, some classes are severely under-represented so I need to increase the number of images for those, perhaps I’ll just run a single batch consisting of nothing but those parts through the machine. No need for corrections, they’ll all be labeled identically.

A recent update is Sorting 2 Tons of Lego, Many Questions, Results.

¹CHOLLET, François. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357, 2016.