Search code examples
searchstatisticsdata-miningsimilaritymetric

Matching Based on Arbitrary Categories and Similarity Measures


I have customer database who have certain attributes, and a customer type. The collection of attributes can vary (they do come from a finite set though), and when I look at a new customer with unknown type, with given attributes, I would like to determine which type s/he belongs to. For example, say I have these customers already in DB,

Customer | Type | Attributes

1           A      44,32,5,'X'
2           A      3,32,66,'A'
3           B      6,32,'A', 'B'           
4           C      47,31,2,'H'           
5           C      14,32,2,'O'  
6           C      2,'C'  
7           A      44

When I receive a new customer who has attributes, for example, 3,32,2, I would like to determine which type this customer belongs to, and the code should report its confidence (as percentage) of this match.

What is the best method to use here? Something statistical, or a method based on an affinity matrix of some kind, or recommendation engine style Pearson Correlation coefficients based approach? Sample, pseude code would be most welcome, but any, all ideas are fine.

Thanks,


Solution

  • The way to solve this problem is using Naive Bayes.