html - Printing accents and foreign characters using beautiful soup and python -
i scraping artists discogs.com. unable artist names appear on page. e.g. artist andrés appears andr\xe9s when run code.
can explain i'm doing wrong?
bs4 import beautifulsoup import requests import urllib2 itertools import chain import codecs headers = { 'user-agent': 'mozilla/5.0 (windows nt 6.0; wow64; rv:24.0) gecko/20100101 firefox/24.0' } all_artists = [] result_pages = 1 #446 def load_artists(): page in xrange(1, result_pages+1): url = url = 'https://www.discogs.com/search/?sort=have%2cdesc&style_exact=house&genre_exact=electronic&decade=2010&page=' + str(page) r = requests.get(url, headers = headers) soup = beautifulsoup(r.content.decode('utf-8'), 'html.parser') [all_artists.append(tag["title"]) tag in soup.select('div#search_results h5 span')] load_artists() all_artists
you need use python3, , no longer suffer this
Comments
Post a Comment