I'm trying to unify the records in a database, I'm using the levenshtein algorithm and works for some cases,
Working sample (distance <= 2):
* --------- * ---------- * -------- *
| Looking | Finds | Distance |
* --------- * ---------- * -------- *
| No existe | No Existe | 1 |
| desempleo | Desempleo | 1 |
* --------- * ---------- * -------- *
thats great but ignores cases with mayor distances like:
Femenino
and FEMENINO
with 7 distanceNote: I'm looking for a PHP solution
Compare
echo levenshtein("Femenino", "FEMENINO"); // 7
VS
echo levenshtein(strtolower("Femenino"), strtolower("FEMENINO")); //0
If alphabet case doesn't matter for your application, make both the strings same case before you compare and you'll get significant improvement.