Search code examples
javajavascripturlheuristics

Library/Algorithm to minify URL


I want to display URLs within a limited area: 2 lines and width of ~120px. Obviously most URLs don't fit.

So I'm looking for an approach to 'minify' an URL in order to make it smaller yet still recognizable and distinguishable from others.

for example:

https://stackoverflow.com/questions/ask

http://www.cnn.com/2011/US/03/04/obama.miami.school/index.html

http://techcrunch.com/2011/03/04/founder-stories-foursquare-crowley-invent-future/

http://cran.r-project.org/web/packages/bcp/index.html

become

stackoverflow | ask

cnn | obama.miami.school

techcrunch | founder-stories-foursquare

cran.r-project.org | packages/bcp

So you see this is kind of a creative question. Computing could either be done on server (Java) or client (Javascript).

Any feedback very welcome!


Solution

  • You can:

    • strip common parts ("http://", "www", ".com", ".html" ...)
    • strip numbers
    • strip multiple continous special characters (not letters)
    • define abrevations for common long parts (foursquare -> 4sq)

    • check the pieces that are left against a database how common they are. Keep the ones uncommon and drop the common ones until the result is short enough.