Search code examples
pythonseleniumautomationbeautifulsoupsplinter

How create session in splinter or selenium?


I am trying to scrap some data from a website which require login , (i tried with requests but it is not working with requests) so i used splinter and i am succeed in login via xpath but the problem is , I want to scrap some data using beautifulsoup so after login via splinter in that website how i can use that session to use beautifulsoup for scraping data from user account. Here is my code :

    from selenium import webdriver
    from splinter import Browser


    web_driver=webdriver.Chrome('/Users/paul/Downloads/chromedriver/chromedriver')
    url = "https://www.example.com"
    browser = Browser("chrome")
    visit_browser = browser.visit(url)

    email_box = '//*[@id="email"]'
    find_1 = browser.find_by_xpath(email_box)
    find_1.fill("example@gmail.com")
    password_box = '//*[@id="pass"]'
    find_2 = browser.find_by_xpath(password_box)
    find_2.fill("example12345")

    button_sub = '//*[@id="u_0_5"]'
    find_3 = browser.find_by_xpath(button_sub)
    find_3.click()

"""i tried like this with Beautifulsoup but not working , its giving login page instead of "after login page" """

get_url=requests.get(url)
soup=BeautifulSoup(get_url.text,"html.parser")
print(soup.text)

I managed to login and suppose now i am at front page which appear after login now how to save this session and do work in this session and use beautiful soup to scrap data and print.


Solution

  • You have imported splinter. You can use that to open and manipulate a page using BeautifulSoup — without the need for requests — like this.

    The critical fact to note is that the line page = browser.html gives you the HTML contents of the page for BeautifulSoup to parse.

    >>> from splinter import Browser
    >>> browser = Browser('firefox')
    >>> browser.visit('https://ville.montreal.qc.ca/')
    >>> page = browser.html
    >>> import bs4
    >>> soup = bs4.BeautifulSoup(page, 'lxml')
    >>> soup.find_all(string='Liens rapides')
    ['Liens rapides']
    

    (Incidentally, I used the Firefox browser because Chrome is not supported on the computer I'm using.)


    You have also imported selenium. Therefore, you could alternatively open and manipulate a page using that, or even provide a copy of the page to BeautifulSoup for parsing.

    In this case, it's important to notice that the page contents downloaded using selenium are made available to BeautifulSoup as driver.page_source.

    >>> from selenium import webdriver
    >>> driver = webdriver.Chrome()
    >>> driver.get('https://ville.montreal.qc.ca/')
    >>> links = driver.find_elements_by_xpath('.//a[contains(text(),"English")]')
    >>> links
    [<selenium.webdriver.remote.webelement.WebElement (session="abd3226550e94776152c619b509dd158", element="0.34892531934022464-1")>]
    >>> import bs4
    >>> soup = bs4.BeautifulSoup(driver.page_source, 'lxml')
    >>> soup.find_all(string='Liens rapides')
    ['Liens rapides']