This is my 2nd question, So, apologies if any mistakes.
My main goal is to collect data from different e-commerce sites and then compare the data between them. To do this I need to match same product from different sites. As different sites write title differently I need to extract the attribute of product from title to match correctly. I collected data using scrapy but, can't match the same product from different sites.
My attempt:
First, I collected brands, models etc and then match those from titles in a conventional way. But it's not working as can't collect all the model name to compare from. Also, different categories products attribute is different, not similar. I am trying to find the solution which will work with all kinds of products. Which can learn and identify Brands, Model, Attributes (RAM, Inch, ROM, Camera etc)
I also tried to apply Machine Learning but not understanding which type of approach will fit my need. The most text classification approaches classify categories not extract attribute.
I also read MALLET. but not sure if that will solve my issue. Also tried scikit-learn this tutorial.
Example product titles from different sites. Samsung Galaxy S9 Plus
Please share how can I approach this problem which way is the best. and if possible share some links or resources of the similar goal.
Use sentence2vec or word2vec library to convert the texts into vectors. After that use cosine similarity between vectors.
keep some threshold value of similarity or the vectors with maximum similarity values will be the matched products.
That's how you can compare those.