I have a database with supermarket product items(it contains name, descriptions, price, stock, etc).
I want to make a price comparison between those supermarkets, but, for that i need to know if supermarket A and B refers to the same product.
For example I found out that supermarket A has a product called Leche Evaporada GLORIA Azul Paquete 6un Lata 400g
and supermarket B has a product named Leche Evaporada Gloria Azul Pack 6 Unid x 400 g
and those refers to the same product.
I pointed out that I will need to have semantic comparison for those cases. I'm new in this problems so I don't really know what is the best solution to not underestimate the problem or overkill it.
What I'm doing right now with not so great results:
I'm using python as LP and gensim to create models, dictionaries(bag of word) and to make comparisons.
EDIT: Another examples:
Leche Fresca UHT GLORIA Entera Bolsa 946ml == Leche Entera UHT Gloria Bolsa 946 ml
Yogurt Griego Gloria con Miel y Granola Vaso 115 g == Yogurt Griego GLORIA Batido con Miel Vaso 115g
Leche sin Lactosa GLORIA Mocaccino Botella 330ml == Shake Mocaccino UHT Gloria Frasco 330 ml.
I think a good solution for this problem would be that you compare the products based on a similarity score. For instance, I would use the Jaro-Winkler distance to compare two product descriptions and if the descriptions match to a defined threshold, I would compare the prices.