Search code examples
javascriptpythonweb-scrapingxbmckodi

Simple login function for XBMC (Python) issue


I'm trying to scrape sections of a Javascript calendar page through python(xbmc/kodi). So far I've been able to scrape static html variables but not the JavaScript generated sections.

The variables im trying to retrieve are <strong class="item-title">**this**</strong> , <span class="item-daterange">**this**</span> and <div class="item-location">**this**</div> , note that they are in separate sections of the html source , and rendered through JavaScript. All of them scraped variables should be appended into one String and displayed.

response = net.http_GET('my URL')
    link = response.content
    match=re.compile('<strong class="gcf-item-title">(.+?)</strong>').findall(link)
    for name in match:
        name = name
        print name

From the above with regex i can scrape just one of those variables and since i need a String list to be displayed of all the variables together , How can that be done?

I get that the page has to be pre rendered for the javascript variables to be scraped But since I'm using xbmc , I am not sure on how i can import additional python libraries such as dryscrape to get this done. Downloading Dryscrape gives me a setup.py , init.py file along with some others but how can i use all of them together?

Thanks.


Solution

  • Is your question about the steps to scrape the JavaScript, how to use Python on XBMC/Kodi, or how to install packages that come with a setup.py file?

    Just based on your RegEx above, if your entries are always like <strong class="item-title">**this**</strong> you won't get a match since your re pattern is for elements with class="gcf-item-title.

    Are you using or able to use BeautifulSoup? If you're not using it, but can, you should--it's life changing in terms of scraping websites.