Search code examples
phpalgorithmsimilarity

Best algorithm for find similar text


I'm trying to unify the records in a database, I'm using the levenshtein algorithm and works for some cases,

Working sample (distance <= 2):

* --------- * ---------- * -------- *
|  Looking  |    Finds   | Distance |
* --------- * ---------- * -------- *
| No existe | No Existe  |     1    |
| desempleo | Desempleo  |     1    |    
* --------- * ---------- * -------- *

thats great but ignores cases with mayor distances like:

  • Femenino and FEMENINO with 7 distance

Note: I'm looking for a PHP solution


Solution

  • Compare

       echo levenshtein("Femenino", "FEMENINO");    // 7
    

    VS

     echo levenshtein(strtolower("Femenino"), strtolower("FEMENINO"));  //0
    

    If alphabet case doesn't matter for your application, make both the strings same case before you compare and you'll get significant improvement.