I Am Having trouble Scraping names of games off a web page.. It is returning a blank array.. Once the name is scraped i want it to be written to a newly created Text file.. My Code should be below.. its nowhere near complete but im sure i will need a While condition..
def ScrapeK10():
siteToScrape = 'http://www.kiz10.com/new-games'
print '\n[!] Requesting Kiz10..'
kizReq = requests.get(siteToScrape)
print '\n[!] Scraping Newest Games...'
kizTree - html.fromstring(kizReq.content)
kizElement = kizTree.xpath('//strong[@class="bx-caption"]/text()')
print 'Latest Games : ', kizElement, '\n'
return
The problems im running into is im getting a blank array so im not sure if im actually scraping the site correctly or even using the correct xpath?
Still a little new to this.. Dont want to use Beautiful Soup nor do i want to use Scapy..
But my Goal is to scrape all games names in the web page i gave, And write them to a new file..
Can you use regex? Notice that all the game names are contained in a JavaScript object named 'itemsGame'.
Use regex to filter this out, then use regex again to split each line.
This should do it
def main():
import re
import requests
url = "http://kiz10.com/index.php?page=newgames"
raw = requests.get(url).content
match = re.search("var itemsGame = \[(.*?)\];$", raw, re.M)
for line in re.findall('\[(.*?)\]', match.group(1)):
print(line.replace("'", "").split(",")[3].strip())
Alternatively you could just call eval() on the string from var itemsGame = to the next \n character.
Obviously though, eval is always dangerous and never really recommended