Search code examples
pythonhtmlweb-scrapinggmail

email scraper using python beautiful soup or html module


Currently, I am trying to gather data from my realtor from the listings she sends me. It always comes through a link from the main site "http://v3.torontomls.net" I think only realtors can go into this site and filter on houses, but when she sends it to me I can see a list of houses.

I am wondering if it is possible to create a python script that:)

1) opens Gmail 2) filter's on her emails 3) opens one of her emails 4) clicks on the link 5) Scrapes the house data into a CSV format

I am not sure about the feasibility of this, I have never used python to scrape web pages. I can see step 5 is doable, but how do I go about step 1 to 4?


Solution

  • Yes, this is possible, but you need to do some requirements gathering beforehand to determine which parts of the process can be eliminated. For instance, if your realtor is sending you the same link each time, you can just target that web address directly. If the link changes but is parameterized by month, for instance, you can just adjust the web address each month when you want to process the results.

    To make the requests, I would suggest using the requests package along with bs4 (BeautifulSoup 4) to target elements. For creating CSV files, you may choose to use csv, but there are many alternatives if you require something that's more specific to your use case.