python - Scraping SVG charts -


i trying scrape following svg's following link:

https://finance.yahoo.com/quote/aapl/analysts?p=aapl

the portion trying scrape follows:

images here

i not need words of chart (just graphs themselves). however, have never scraped svg image before , i'm not sure if possible. looked around not find useful python packages directly this.

i know can take screenshot of image python using selenium , use pil crop , save svg, wondering if there more direct way grab these charts off page. useful packages or implementations helpful. thank you.

edit: got down votes not sure why here how implement in way..

import sys import time pyqt4.qtcore import * pyqt4.qtgui import * pyqt4.qtwebkit import *  class screenshot(qwebview): def __init__(self):     self.app = qapplication(sys.argv)     qwebview.__init__(self)     self._loaded = false     self.loadfinished.connect(self._loadfinished)  def capture(self, url, output_file):     self.load(qurl(url))     self.wait_load()     # set webpage size     frame = self.page().mainframe()     self.page().setviewportsize(frame.contentssize())     # render image     image = qimage(self.page().viewportsize(), qimage.format_argb32)     painter = qpainter(image)     frame.render(painter)     painter.end()     print 'saving', output_file     image.save(output_file)  def wait_load(self, delay=0):     # process app events until page loaded     while not self._loaded:         self.app.processevents()         time.sleep(delay)     self._loaded = false  def _loadfinished(self, result):     self._loaded = true  s = screenshot() s.capture('https://finance.yahoo.com/quote/aapl/analysts?p=aapl', 'yhf.png') 

i use crop function in pil take images out of charts.

using qwebview web scraping seams weird me, although realize there advantage says server "i'm not web scraper, i'm embeded browser". note approach not bulletproof: scraper can still detected if shows behavior unusual human user.

this how it:

  1. id use requests download page (may through proxy hides real ip addres combat ip-bans).
  2. then i'd parse page using beautifulsoup url of svg file trying get.
  3. then i'd download svg file , convert image using something this

if want continue using qt instead, methods in web view allow inspecting dom or extracting resources view downloaded.


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -