Search code examples
pythonurlxbmc

Cut and resubmit url in python


I'm new to python and trying to figure this out, so sorry if this has been asked. I couldn't find it and don't know what this may be called.

So the short of it. I want to take a link like:

http://www.somedomainhere.com/embed-somekeyhere-650x370.html

and turn it into this:

http://www.somedomainhere.com/somekeyhere

The long of it, I have been working on an addon for xbmc that goes to a website, grabs a url, goes to that url to find another url. Basically a url resolver.

So the program searches the site and comes up with somekeyhere-650x370.html. But that page is in java and is unusable to me. but when I go to com/somekeyhere that code is usable. So I need to grab the first url, change the url to the usable page and then scrape that page.

So far the code I have is

if 'somename' in name:
try:
  n=re.compile('<iframe title="somename" type="text/html" frameborder="0" scrolling="no" width=".+?" height=".+?" src="(.+?)">" frameborder="0"',re.DOTALL).findall(net().http_GET(url).content)[0]
CONVERT URL to .com/somekeyhere SO BELOW NA CAN READ IT.
  na = re.compile("'file=(.+?)&.+?'",re.DOTALL).findall(net().http_GET(na).content)[0]

Any suggestions on how I can accomplish converting the url?


Solution

  • I really didn't get the long of your question. However, answering the short

    Assumptions: somekey is a alphanumeric

    a='http://www.domain.com/embed-somekey-650x370.html'
    p=re.match(r'^http://www.domain.com/embed-(?P<key>[0-9A-Za-z]+)-650x370.html$',a)
    somekey=p.group('key')
    requiredString="http://www.domain.com/"+somekey #comment1
    

    I have really provided a very specific answer here for just the domain name. You should modify the regex as required. I see your code in question uses regex and hence i assume you can frame a regex to match your requirement better.

    EDIT 1 : also see urlparse from here https://docs.python.org/2/library/urlparse.html?highlight=urlparse#module-urlparse

    It provides an easy way to get to parse your url

    Also, in line with "#comment1" you can actually save the domain name to a variable and reuse it here