I would like to know what is the best/preferred PYTHON 3.x solution (fast to execute, easy to implement, option to specify user agent, send browser & version etc to webserver to avoid my IP being blacklisted) which can scrape data on all of below options (mentioned based on complexity as per my understanding).
- Any Static webpage with data in tables / Div
- Dynamic webpage which completes loading in one go
- Dynamic webpage which requires signin using username password & completes loading in one go after we login.
Sample URL for username password: https://dashboard.janrain.com/signin?dest=http://janrain.com
- Dynamic web-page which requires sign-in using oauth from popular service like LinkedIn, google etc & completes loading in one go after we login. I understand this involves some page redirects, token handling etc.
Sample URL for oauth based logins: https://dashboard.janrain.com/signin?dest=http://janrain.com
- All of bullet point 4 above combined with option of selecting some drop-down (lets say like "sort by date") or can involve selecting some check-boxes, based on which the dynamic data displayed would change.
I need to scrape the data after the action of check-boxes/drop-downs has been performed as any user would do it to change the display of the dynamic data
Sample URL - https://careers.microsoft.com/us/en/search-results?rk=l-seattlearea
You have option of drop-down as well as some checkbox in the page
- Dynamic webpage with Ajax loading in which data can keep loading as
=> 6.1 we keep scrolling down like facebook, twitter or linkedin main page to get data
Sample URL - facebook, twitter, linked etc
=> 6.2 or we keep clicking some button/div at the end of the ajax container to get next set of data;
Sample URL - https://www.linkedin.com/pulse/cost-climate-change-indian-railways-punctuality-more-editors-india-/
Here you have to click "Show Previous Comments" at the bottom of the page if you need to look & scrape all the comments
I want to learn & build one exhausted scraping solution which can be tweaked to cater to all options from the easy task of bullet point 1 to the complex task of bullet point 6 above as and when required.