I am trying to scrap some data from a website which require login , (i tried with requests but it is not working with requests) so i used splinter and i am succeed in login via xpath but the problem is , I want to scrap some data using beautifulsoup so after login via splinter in that website how i can use that session to use beautifulsoup for scraping data from user account. Here is my code :
from selenium import webdriver
from splinter import Browser
web_driver=webdriver.Chrome('/Users/paul/Downloads/chromedriver/chromedriver')
url = "https://www.example.com"
browser = Browser("chrome")
visit_browser = browser.visit(url)
email_box = '//*[@id="email"]'
find_1 = browser.find_by_xpath(email_box)
find_1.fill("example@gmail.com")
password_box = '//*[@id="pass"]'
find_2 = browser.find_by_xpath(password_box)
find_2.fill("example12345")
button_sub = '//*[@id="u_0_5"]'
find_3 = browser.find_by_xpath(button_sub)
find_3.click()
"""i tried like this with Beautifulsoup but not working , its giving login page instead of "after login page" """
get_url=requests.get(url)
soup=BeautifulSoup(get_url.text,"html.parser")
print(soup.text)
I managed to login and suppose now i am at front page which appear after login now how to save this session and do work in this session and use beautiful soup to scrap data and print.
You have imported splinter. You can use that to open and manipulate a page using BeautifulSoup — without the need for requests — like this.
The critical fact to note is that the line page = browser.html
gives you the HTML contents of the page for BeautifulSoup to parse.
>>> from splinter import Browser
>>> browser = Browser('firefox')
>>> browser.visit('https://ville.montreal.qc.ca/')
>>> page = browser.html
>>> import bs4
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> soup.find_all(string='Liens rapides')
['Liens rapides']
(Incidentally, I used the Firefox browser because Chrome is not supported on the computer I'm using.)
You have also imported selenium. Therefore, you could alternatively open and manipulate a page using that, or even provide a copy of the page to BeautifulSoup for parsing.
In this case, it's important to notice that the page contents downloaded using selenium are made available to BeautifulSoup as driver.page_source
.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://ville.montreal.qc.ca/')
>>> links = driver.find_elements_by_xpath('.//a[contains(text(),"English")]')
>>> links
[<selenium.webdriver.remote.webelement.WebElement (session="abd3226550e94776152c619b509dd158", element="0.34892531934022464-1")>]
>>> import bs4
>>> soup = bs4.BeautifulSoup(driver.page_source, 'lxml')
>>> soup.find_all(string='Liens rapides')
['Liens rapides']