Search code examples
regexfunctioncoldfusionslug

Coldfusion regex to generate slug


I have this function to generate slugs in Coldfusion:

<cffunction name="generateSlug" output="false" returnType="string">
    <cfargument name="str">
    <cfargument name="spacer" default="-">

    <cfset var ret = "" />

    <cfset str = lCase(trim(str)) />
    <cfset str = reReplace(str, "[àáâãäå]", "a", "all") />
    <cfset str = reReplace(str, "[èéêë]", "e", "all") />
    <cfset str = reReplace(str, "[ìíîï]", "i", "all") />
    <cfset str = reReplace(str, "[òóôö]", "o", "all") />
    <cfset str = reReplace(str, "[ùúûü]", "u", "all") />
    <cfset str = reReplace(str, "[ñ]", "n", "all") />
    <cfset str = reReplace(str, "[^a-z0-9-]", "#spacer#", "all") />
    <cfset ret = reReplace(str, "#spacer#+", "#spacer#", "all") />

    <cfif left(ret, 1) eq "#spacer#">
        <cfset ret = right(ret, len(ret)-1) />
    </cfif>
    <cfif right(ret, 1) eq "#spacer#">
        <cfset ret = left(ret, len(ret)-1) />
    </cfif>

    <cfreturn ret />
</cffunction>

and then i am calling it using this:

<cfset stringToBeSlugged = "This is a string abcde àáâãäå èéêë ìíîï òóôö ùúûü ñ año ñññññññññññññ" />
<cfset slug = generateSlug(stringToBeSlugged) />
<cfoutput>#slug#</cfoutput>

But this is output me this slug:

this-is-a-string-abcde-a-a-a-a-a-a-e-e-e-e-i-i-i-i-o-o-o-o-u-u-u-u-n-a-no-n-n-n-n-n-n-n-n-n-n-n-n-n

it seems that all the accented characters are correctly replaced but this function is inserting a '-' after replacing them. Why?

Where is the error?

PD: i am expecting this output:

this-is-a-string-abcde-aaaaaa-eeee-iiii-oooo-uuuu-n-ano-nnnnnnnnnnnnn 

Thanks.


Solution

  • Does this work for you? (I've adapted a similar script that we use internally.) I believe that we used this with ColdFusion 8 as we are still use it w/CF9.

    <cffunction name="generateSlug" output="false" returnType="string">
        <cfargument name="str" default="">
        <cfargument name="spacer" default="-">
        <cfset var ret = replace(arguments.str,"'", "", "all")>
        <cfset ret = trim(ReReplaceNoCase(ret, "<[^>]*>", "", "ALL"))>
        <cfset ret = ReplaceList(ret, "À,Á,Â,Ã,Ä,Å,Æ,È,É,Ê,Ë,Ì,Í,Î,Ï,Ð,Ñ,Ò,Ó,Ô,Õ,Ö,Ø,Ù,Ú,Û,Ü,Ý,à,á,â,ã,ä,å,æ,è,é,ê,ë,ì,í,î,ï,ñ,ò,ó,ô,õ,ö,ø,ù,ú,û,ü,ý,&nbsp;,&amp;", "A,A,A,A,A,A,AE,E,E,E,E,I,I,I,I,D,N,O,O,O,O,O,0,U,U,U,U,Y,a,a,a,a,a,a,ae,e,e,e,e,i,i,i,i,n,o,o,o,o,o,0,u,u,u,u,y, , ")>
        <cfset ret = trim(rereplace(ret, "[[:punct:]]"," ","all"))>
        <cfset ret = rereplace(ret, "[[:space:]]+","!","all")>
        <cfset ret = ReReplace(ret, "[^a-zA-Z0-9!]", "", "ALL")>
        <cfset ret = trim(rereplace(ret, "!+", arguments.Spacer, "all"))>
        <cfreturn ret>
    </cffunction>
    
    <cfset stringToBeSlugged = "This is a string abcde àáâãäå èéêë ìíîï òóôö ùúûü ñ año ñññññññññññññ" />
    <cfoutput>"#stringToBeSlugged# = #generateSlug(stringToBeSlugged)#</cfoutput>
    

    Support for more International Character

    If you want to widen your support for international characters, you could use ICU4J (java) and Paul Hastings' Transliterator.CFC to transliterate all of the characters and then replace any remaining spaces, dashes, slashes, etc with dashes.

    https://gist.github.com/JamoCA/ec4617b066fc4bb601f620bc93bacb57

    http://site.icu-project.org/download

    After installing both, you can convert non-Latin characters by identifying the language id (to be converted to) and pass the string to be converted:

    <cfset Transliterator = CreateObject("component","transliterator")>
    
    <cfoutput>
    <cfloop array="#TestStrings#" index="TestString">
    <h3>TestString = "#TestString#"</h3>
    <blockquote>
        <div>CFC-1 = #Transliterator.transliterate('Latin-ASCII', TestString)#</div>
        <div>CFC-2 = #Transliterator.transliterate('any-NFD; [:nonspacing mark:] any-remove; any-NFC', TestString)#</div>       
    </blockquote>
    <hr>
    </cfloop>
    </cfoutput>
    
    <h2>Available Language IDs</h2>
    <cfdump var="#Transliterator.getAvailableIDs()#" label="Language IDs">