Search code examples
pythonfunctionmechanize

How to use mechanize package inside a function to log into website?


I am using mechanize to log into a website and scrape it with beautifulsoap. While I got it working without using functions, I don't know how to put the login-functionality into a function and then later use it in the main program. Here's my not working code so far:

#!/usr/bin/env python

import http.cookiejar as cookielib
import mechanize
from bs4 import BeautifulSoup

def set_browser():
    br = mechanize.Browser()
    cookiejar = cookielib.LWPCookieJar()
    br.set_cookiejar(cookiejar)
    br.set_handle_equiv(True)
    br.set_handle_gzip(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)
    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time = 1)
    br.addheaders = [( 'User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1' )]
    return br

def login(br):
    br.open("https://example.com/login/index.php")
    br.select_form(nr=0)
    br.form['username'] = "admin"
    br.form['password'] = "mypassword"
    br.submit()

def scrape():
    url = "https://example.com/content"
    data = br.open(url).get_data()
    soup = BeautifulSoup(data, 'html.parser')
    with open("source.html", "w") as text_file:
        print(soup.prettify(), file=text_file)

if __name__ == "__main__":
    set_browser()
    login(br)
    scrape()

I hope someone can help me how to write proper functions. In my above code, I wrote two functions set_browser() and login() but it isn't important to have two functions; if both are combined into one, it will be okay, I just split it into two to really learn using functions.


Solution

  • I think that when returning a value, you need to store it somewhere and then use it in the next function, so it should like something like this

    def login(br):
        br.open("https://example.com/login/index.php")
        br.select_form(nr=0)
        br.form['username'] = "admin"
        br.form['password'] = "mypassword"
        br.submit()
    
    if __name__ == "__main__":
        br = set_browser()
        login(br)
        scrape()