I am working with the following URL: http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2
I am trying to extract the name of the blog as (stephania-bell).
I have implemented following function to extract the expected value from URL:
def getBlogName( def decodeUrl )
{
def urlParams = this.paramsParser.parseURIToMap( URI.create( decodeUrl ) )
def temp = decodeUrl.replace( "http://www.espn.com", "" )
.replaceAll( "(/_/|\\?).*", "" )
.replace( "/index", "" )
.replace( "/insider", "" )
.replace( "/post", "" )
.replace( "/tag", "" )
.replace( "/category", "" )
.replace( "/", "" )
.replace( "/blog/", "" )
def blogName = temp.replace( "/", "" )
return blogName
}
However I am missing something and the value it returns is blogstephania-bell
. Could you please help me understanding what I am missing in the function implementation? Or maybe there is a better way of doing the same thing?
This kind of job can be easily handled by regular expression. If we want to extract URL part between http://www.espn.com/blog/
and the next /
then following code will do the trick:
import java.util.regex.Pattern
def url = 'http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2'
def pattern = Pattern.compile('^https?://www\\.espn\\.com/blog/([^/]+)/.*$')
def (_, blog) = (url =~ pattern)[0]
assert blog == 'stephania-bell'