Search code examples
javaencodingslug

Creating queryable strings


I'd like to store strings also in a more queryable slug-like format to the database, forcing it to lowercase, replacing the accented letters with their latin counterparts (ä -> a, ö -> o, ç -> c etc.) and replacing other special characters with e.g. dashes. Is there a standard for these kind of format? What would be preferable means to achieve it in Java?


Solution

  • This is the solution that I've found working best so far:

    return Normalizer
        .normalize(src.trim().toLowerCase(Locale.ENGLISH),
            Normalizer.Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
        .replaceAll("[^\\p{ASCII}]+", "-")
        .replaceAll("[^a-z0-9]+", "-").replaceAll("(^-|-$)+", "");
    

    This converts: ¿Qué? to que, Cool!!!!1 to cool-1 and åæø to a.