ajax - Python requests module gets the same results despite incrementing page number -


the thing changes in url page number, incremented after each request.

other selenium or related tools, i’m not sure approach used traverse pages. instinct there may header/query combination data directly, don't know find it.

url = 'http://therunningbug.co.uk/events/find-races.aspx?eventname=&addressregion=&addresscounty=&date=&surface=#sort=date&page='  page = 1  while true:      pagedata = beautifulsoup(requests.get(url + str(page)).content)      articles = pagedata.find('div', {'class':"items-content"})      in articles.find_all('article'):         name = a.find('span', {'itemprop':"name"}).text         d, t = a.find('time').get('datetime').split('t')          timedata = t[:-3]          datedata = d.split('-')         date = (datedata[1] + '/' + datedata[2] + '/' + datedata[0][2:]).strip()         description = a.find('p', {'itemprop':"description"}).text.strip()         weblink = 'http://therunningbug.co.uk' + a.find('a', {'itemprop':"url"}).get('href')         category = a.find('span', {'class':"surface"}).text         location = a.find('span', {'class':"region"}).text + ', ' + a.find('span', {'class':"county"}).text          print name, ' -- name'         print date, ', ', timedata, ' -- date, time'         print description, ' -- description'         print weblink, ' -- website link'         print category, ' -- category'         print location, ' -- location\n'      page += 1 

the problem url encoding. can urlencode:

url = 'http://therunningbug.co.uk/events/find-races.aspx' payload = {'page': page} pagedata = beautifulsoup(requests.get(url, params = payload).content) 

this works there no complex characters in uri url encode.

url = 'http://therunningbug.co.uk/events/find-races.aspx' pagedata = beautifulsoup(requests.get(url + '?page=' + str(page)).content) 

see requests documentation url encoding. http://docs.python-requests.org/en/master/user/quickstart/

complete code:

#!/usr/bin/env python  import requests bs4 import beautifulsoup  page = 1 while true:      url = 'http://therunningbug.co.uk/events/find-races.aspx'     payload = {'page': page}     pagedata = beautifulsoup(requests.get(url, params = payload).content)      articles = pagedata.find('div', {'class':"items-content"})      in articles.find_all('article'):         name = a.find('span', {'itemprop':"name"}).text         d, t = a.find('time').get('datetime').split('t')          timedata = t[:-3]          datedata = d.split('-')         date = (datedata[1] + '/' + datedata[2] + '/' + datedata[0][2:]).strip()         description = a.find('p', {'itemprop':"description"}).text.strip()         weblink = 'http://therunningbug.co.uk' + a.find('a', {'itemprop':"url"}).get('href')         category = a.find('span', {'class':"surface"}).text         location = a.find('span', {'class':"region"}).text + ', ' + a.find('span', {'class':"county"}).text          print name, ' -- name'         print date, ', ', timedata, ' -- date, time'         print description, ' -- description'         print weblink, ' -- website link'         print category, ' -- category'         print location, ' -- location\n'      page += 1 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -