python - Scraping SVG charts -
i trying scrape following svg's following link:
https://finance.yahoo.com/quote/aapl/analysts?p=aapl
the portion trying scrape follows:
i not need words of chart (just graphs themselves). however, have never scraped svg image before , i'm not sure if possible. looked around not find useful python packages directly this.
i know can take screenshot of image python using selenium , use pil crop , save svg, wondering if there more direct way grab these charts off page. useful packages or implementations helpful. thank you.
edit: got down votes not sure why here how implement in way..
import sys import time pyqt4.qtcore import * pyqt4.qtgui import * pyqt4.qtwebkit import * class screenshot(qwebview): def __init__(self): self.app = qapplication(sys.argv) qwebview.__init__(self) self._loaded = false self.loadfinished.connect(self._loadfinished) def capture(self, url, output_file): self.load(qurl(url)) self.wait_load() # set webpage size frame = self.page().mainframe() self.page().setviewportsize(frame.contentssize()) # render image image = qimage(self.page().viewportsize(), qimage.format_argb32) painter = qpainter(image) frame.render(painter) painter.end() print 'saving', output_file image.save(output_file) def wait_load(self, delay=0): # process app events until page loaded while not self._loaded: self.app.processevents() time.sleep(delay) self._loaded = false def _loadfinished(self, result): self._loaded = true s = screenshot() s.capture('https://finance.yahoo.com/quote/aapl/analysts?p=aapl', 'yhf.png')
i use crop function in pil take images out of charts.
using qwebview web scraping seams weird me, although realize there advantage says server "i'm not web scraper, i'm embeded browser". note approach not bulletproof: scraper can still detected if shows behavior unusual human user.
this how it:
- id use requests download page (may through proxy hides real ip addres combat ip-bans).
- then i'd parse page using beautifulsoup url of svg file trying get.
- then i'd download svg file , convert image using something this
if want continue using qt instead, methods in web view allow inspecting dom or extracting resources view downloaded.
Comments
Post a Comment