Skip to content Skip to sidebar Skip to footer

Beautifulsoup Select Method Not Selecting Results As Expected

Following the automate the boring stuff tutorial in chapter 11, the I’m Feeling Lucky Google Search project. It's downloading the HTML data correctly seemingly but when I use bea

Solution 1:

This doesn't work because of your get request to Google. If I use developer tools in chrome on Google the div class r does exist. However, when I download the query with request.get it's no longer there. However, there's now a div class called 'jfp3ef'. I was able to get the a tags associated with the search results with the following

soup = soup.find_all("div", {"class": "jfp3ef"})
for div in soup:
    print(div.select("a"))

If you want you can download the entire page with the divs in the r class by using urllib.request, but Google blocks this behavior so you have to change the header information.

SEARCHVAR = sys.argv[1:]
query = 'http://google.com/search?q=' + ' '.join(SEARCHVAR)
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 
(KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(query, headers= headers)
html = urllib.request.urlopen(req).read()
print('Searching ' + ' '.join(SEARCHVAR[:]) + ' on Google')
soup = bs4.BeautifulSoup(html, 'html.parser')
print('Parsing')
linkElems = soup.select('.r a') 
print(str(linkElems)

The example in the book is out of date. I assume my top example with class "jfp3ef" is randomized from google and will break soon or may not work for you at all. The bottom example does work well.

Solution 2:

Replace this:

linkElems = soup.select('div#main > div > div > div > a')

With:

linkElems = soup.select('div#main > div > div > div > a')

Post a Comment for "Beautifulsoup Select Method Not Selecting Results As Expected"