Beautifulsoup Select Method Not Selecting Results As Expected
Following the automate the boring stuff tutorial in chapter 11, the I’m Feeling Lucky Google Search project. It's downloading the HTML data correctly seemingly but when I use bea
Solution 1:
This doesn't work because of your get request to Google. If I use developer tools in chrome on Google the div class r does exist. However, when I download the query with request.get it's no longer there. However, there's now a div class called 'jfp3ef'. I was able to get the a tags associated with the search results with the following
soup = soup.find_all("div", {"class": "jfp3ef"})
for div in soup:
print(div.select("a"))
If you want you can download the entire page with the divs in the r class by using urllib.request, but Google blocks this behavior so you have to change the header information.
SEARCHVAR = sys.argv[1:]
query = 'http://google.com/search?q=' + ' '.join(SEARCHVAR)
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17
(KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(query, headers= headers)
html = urllib.request.urlopen(req).read()
print('Searching ' + ' '.join(SEARCHVAR[:]) + ' on Google')
soup = bs4.BeautifulSoup(html, 'html.parser')
print('Parsing')
linkElems = soup.select('.r a')
print(str(linkElems)
The example in the book is out of date. I assume my top example with class "jfp3ef" is randomized from google and will break soon or may not work for you at all. The bottom example does work well.
Solution 2:
Replace this:
linkElems = soup.select('div#main > div > div > div > a')
With:
linkElems = soup.select('div#main > div > div > div > a')
Post a Comment for "Beautifulsoup Select Method Not Selecting Results As Expected"