I am trying to scrape the following svg's from the following link:
The portion I am trying to scrape is as follows:
I do not need the words of the chart (just the graphs themselves). However, I have never scraped an svg image before and i'm not sure if it is possible. I looked around but could not find any useful python packages to directly do this.
I know that I can take a screenshot of the image with python using selenium and then use PIL to crop it and save it as an svg, but I am wondering if there is a more direct way to grab these charts off the page. Any useful packages or implementations would be helpful. Thank you.
Edit: Got some down votes but not sure why Here is how I would implement it in my way..
import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Screenshot(QWebView):
def __init__(self):
self.app = QApplication(sys.argv)
self._loaded = False
def capture(self, url, output_file):
# set to webpage size
frame = self.page().mainFrame()
# render image
image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
print 'saving', output_file
def wait_load(self, delay=0):
# process app events until page loaded
while not self._loaded:
self._loaded = False
def _loadFinished(self, result):
self._loaded = True
s = Screenshot()
s.capture('https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL', 'yhf.png')
I would then use the crop function in PIL to take the images out of the charts.
Using QWebView for web scraping seams weird to me, although I do realize that there is an advantage that it says to the server "I'm not a web scraper, I'm an embeded browser". Note that this approach is not bulletproof: your scraper can still be detected if it shows a behavior unusual for a human user.
This is how I would do it:
If you want to continue using Qt instead, look for methods in the web view that allow inspecting DOM or extracting the resources the view downloaded.