Search code examples
pythonexcelgoogle-sheetstampermonkey

Looking for the best way to automate scraping values off of a CMS to build reports


first post so go easy on me :)

The situation is that I'm trying to scrape the information off of a web based (customer) CMS (Customer-Management System) that has sales information on it to have it then get those values into excel or Google sheets to ultimately build a report, thus saving time/errors from flipping through all of them manually.

I remember using a solution (multiple tools) once that would basically go through the pages and take values from defined fields on those pages and then throw that information into columns on a sheet that we'd then manipulate manually. I'm pretty sure it was python based and (I think) used tampermonkey extension to get the information on a dev/debugger version of chrome.

The process looked something like this:

  • Already logged into the CMS -> Execute the tool/script that would then automatically open an order in a new window
  • It'd then go through that order and take values from specific fields and then copy those values in a sheet
  • It'd then close the window and proceed on to the next order in the specified range
  • Once it completes the specified (date) range, the columns would be something like salesperson / order number / sale amount / attachment amount / etc - to then be manually manipulated, no further automation needed (beyond the formulas in the sheet)

Anyone have any ideas on how to get this done or any guides anyone knows of for this specific type of task? Trying to automate this as much as possible - Thanks in advance.


Solution

  • Python should be a good choice as it provides you with many different tools. Depending on the functionality of the CMS you can choose different packages.

    Simple HTML scraping

    For simple scraping of static HTML content scrapy or Beautiful Soup should be enough.

    Scraping including executable content

    For these cases you can use Selenium which you can combine with Beautiful Soup. For more details can be found in this related question and this one.