authentication - How to crawl Factiva data with python Scrapy? -

i'm working on data factiva, in python 3.5.2. , have use school login see data.

i have followed post try create login spider

however, got error:

this code:

# test login spider import scrapy scrapy.selector import htmlxpathselector scrapy.http import request   login_url = "https://login.proxy.lib.sfu.ca/login?qurl=https%3a%2f%2fglobal.factiva.com%2fen%2fsess%2flogin.asp%3fxsid%3ds002sbj1svr2svo5des5depotavndaoodzymhn0yqyvmq382rbrqufbqufbqufbqufbqufbqufbqufbqufbqufbqufbqufbqqaa" user_name = b"[my_user_name]" pswd = b"[my_password]" response_page = "https://global-factiva-com.proxy.lib.sfu.ca/hp/printsavews.aspx?pp=save&hc=all"   class myspider(scrapy.spider):     name = 'myspider'      def start_requests(self):         return [scrapy.formrequest(login_url,                                formdata={'user': user_name, 'pass': pswd},                                callback=self.logged_in)]      def logged_in(self, response):         # login failed         if "authentication failed" in response.body:             print ("login failed")         # login succeeded         else:             print ('login succeeded')             # return request(url=response_page,             #        callback=self.parse_responsepage)      def parse_responsepage(self, response):         hxs = htmlxpathselector(response)         yum = hxs.select('//span/@enheadline')   def main():     test_spider = myspider(scrapy.spider)     test_spider.start_requests()  if __name__ == "__main__":     main()

in order run code, using terminal command line in top directory of project:

scrapy runspider [my_file_path]/auth_spider.py

do know how deal errors here?

as you're using python 3.x, "authentication failed" str while response.body of type bytes.

to resolve issue, either perform test in str:

if "authentication failed" in response.body_as_unicode():

or in bytes:

if b"authentication failed" in response.body:

Search This Blog

WIKI

authentication - How to crawl Factiva data with python Scrapy? -

Comments

Post a Comment

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - InvalidDataAccessApiUsageException: Parameter value element did not match expected type -