Search code examples
sqlinternationalizationslugpermalinks

Permalink/slug best practices


I'm about to setup permalinks on a website and I'm wondering what sort of conventions I should follow in regards to generating them from the existing table of news articles.

So far here are the rules/steps I've come up with:

  1. Create a dump file of my live database, set it up locally
  2. Add a new 'permalink' column to my news_articles table with ALTER table ADD column slug VARCHAR(100) ( Not sure on the data type or length, suggestions? )
  3. Update every row in my news_articles table, such that

the permalink column by default is replaced by the title, after the title goes through some formatting:

Disclaimer: the titles are Russian Cyrillic and not English so I don't think I can get away with an easy regex replacement, unless perhaps I rely on unicode code ranges except I'm not familiar with the range of the Cyrillic characters so I'll do manual replacement and the rules I've defined so far are:

  • replace all spaces with an underscore
  • strip off leading/trailing commas and periods
  • remove all single/double quotes
  • remove all question marks
  • & becomes the russian word for 'and'

I'd appreciate any advice on my current strategy before I actually execute it, as there are thousands of articles.


Solution

  • Some comments/suggestions

    • replace all spaces with an underscore (how about hypen instead of underscore, like tags in SO)
    • strip off leading/trailing commas and periods (leading/trailing spaces too)
    • remove all single/double quotes (starting and trailing ones, if not replace with hyphens)
    • remove all question marks (same as above and all other punctuation characters like “ ” (left/right double quotes) and symbols like % and = etc etc)
    • & becomes the russian word for 'and' (Do you mean и? has to be readable without need for url encoding)

    Update:
    * convert all characters to lower case (kinda subjective but it's a common practice and more readable then all upper case)

    One more suggestion, this is a trick which I used to check if my url path or slug do not contain any urlencoded string. After formulated the slug string, I will perform an urlencode and compare with the pre-encoded one. They should be the same.

    Example for this post, if the slug happened to be "permalink-slug:best-practices", the urlencoded one will be "permalink-slug%3Abest-practices", not really readable and defeats the purpose of slug which you will know/agree. So it's really easy to know the orginal string don't work because it is different after urlencoding.