authentication - How to crawl Factiva data with python Scrapy? -
i'm working on data factiva, in python 3.5.2. , have use school login see data.
i have followed post try create login spider
this code:
# test login spider import scrapy scrapy.selector import htmlxpathselector scrapy.http import request login_url = "https://login.proxy.lib.sfu.ca/login?qurl=https%3a%2f%2fglobal.factiva.com%2fen%2fsess%2flogin.asp%3fxsid%3ds002sbj1svr2svo5des5depotavndaoodzymhn0yqyvmq382rbrqufbqufbqufbqufbqufbqufbqufbqufbqufbqufbqufbqqaa" user_name = b"[my_user_name]" pswd = b"[my_password]" response_page = "https://global-factiva-com.proxy.lib.sfu.ca/hp/printsavews.aspx?pp=save&hc=all" class myspider(scrapy.spider): name = 'myspider' def start_requests(self): return [scrapy.formrequest(login_url, formdata={'user': user_name, 'pass': pswd}, callback=self.logged_in)] def logged_in(self, response): # login failed if "authentication failed" in response.body: print ("login failed") # login succeeded else: print ('login succeeded') # return request(url=response_page, # callback=self.parse_responsepage) def parse_responsepage(self, response): hxs = htmlxpathselector(response) yum = hxs.select('//span/@enheadline') def main(): test_spider = myspider(scrapy.spider) test_spider.start_requests() if __name__ == "__main__": main()
in order run code, using terminal command line in top directory of project:
scrapy runspider [my_file_path]/auth_spider.py
do know how deal errors here?
as you're using python 3.x, "authentication failed"
str
while response.body
of type bytes
.
to resolve issue, either perform test in str
:
if "authentication failed" in response.body_as_unicode():
or in bytes
:
if b"authentication failed" in response.body:
Comments
Post a Comment