Search code examples
javascriptpythonhtmlgeoipmaxmind

How to use Python to get elements that do not appear in HTML, but appear in "Inspect Element" tool of Chrome?


Dear Python Experts out there! I am totally new to Python and writing a small program to fetch the information from a web page. There is nothing to ask if the page would return all the information in the page-source HTML, which can easily view by Chrome. The problem is that the Elements I want to get after submitting an IP address to https://www.maxmind.com/en/geoip-demo do not appear in the body of HTML, but only when I click "inspect element" tool of Chrome. I used to following code to post to the page and print the response string, but the elements I want are not there.

import urllib2
import requests

url = 'https://www.maxmind.com/en/geoip-demo'
data = {'addresses':'162.237.72.200'}

post = requests.post(url, data = data)
content = post.content

print content

With this code, I hope to get some information related to the IP address in the body of HTML such as

162.237.72.200  
US  
Pittsburg,California,United States,North America    
94565   
38.0051,
-121.8387   
AT&T U-verse    
AT&T U-verse    
sbcglobal.net   
807

But those information is not there in the HTML body, so I am really grateful if anyone could give me just a hint to solve the problem. Thank you so much!


Solution

  • A working solution simulating the browser navigation and interaction with the form to retrieve the data using scrapy and webdriver.

    class MaxSpider(CrawlSpider):
        name = "max"
        allowed_domains = ["maxmind.com"]
        start_urls = ["https://www.maxmind.com/en/geoip-demo"]
    
    def __init__(self):
        self.driver = webdriver.Firefox()
    
    def parse(self, response):
        self.driver.get(response.url)
        button = self.driver.find_element_by_id('addresses')
        login_form = self.driver.find_element_by_id('addresses')
        actions = ActionChains(self.driver)
        actions.click(login_form)
        actions.perform()
        login_form.send_keys("62.237.72.200")
        submit = self.driver.find_element_by_xpath('//*[@id="geoip-demo-form"]/button')
        actions.click(submit)
        time.sleep(3)
        for element in self.driver.find_elements_by_id('geoip-demo-results-tbody'):
           print element.text
        self.driver.close() 
    

    excerpt from output:

    2015-01-13 13:27:18+0100 [max] DEBUG: Crawled (200) https://www.maxmind.com/en/geoip-demo> (referer: http://www.bing.com)

    62.237.72.200 FI Finland, Europe 60.1708, 24.9375 Tele Danmark Tele Danmark