How to apply clustering of strings which are having similar name(like McDonald and Mc DOnald's) in a dataset and if string are same (like sam and other also sam) then how to again do clustering based on value or price for example- Consider a data table having 10 elements
name price
ram 200
shyam 150
ram12 59
gita 45
ram 2 45
g11ita 23
john2 32
john 7
jonh21 8
jonh 38
ram22 3
Then grouping should be
ram 200
ram12 59
ram 2 45
ram22 3
john2 32
jonh 37
john 7
john21 8
gita 45
g11ita 23
I have used string clustering using fuzzywuzzy and Levenheneitein distance but it only able to cluster string and does no able to cluster price How to cluster first string and if same then cluster price
You will need to carefully balance thresholds in textual similarity and in numerical similarity. There won't be an easy solution, and unless you have really huge data, a manual approach may be best.
Textual similarity of short strings is highly unreliable.
For example: "dog" and "fog" only differ by a single letter, but are very unlikely typos. They have Levenshtein distance 1, the smallest non-zero value! Because of this, if you rely on Levenshtein, you will have plenty of false positives - okay if you manually verify them, but not for automatic processing.
So at the minimum you'll need to use something that knows about (a) existing words, that are unlikely misspelled, (b) common misspellings, and (c) phonetic similarity to estimate how likely a word is misspelled, (d) keyboard similarity, how likely a word is mistyped...