Beautifulsoup Select Method Not Selecting Results As Expected

January 05, 2024 Post a Comment

Following the automate the boring stuff tutorial in chapter 11, the I’m Feeling Lucky Google Search project. It's downloading the HTML data correctly seemingly but when I use bea

Solution 1:

This doesn't work because of your get request to Google. If I use developer tools in chrome on Google the div class r does exist. However, when I download the query with request.get it's no longer there. However, there's now a div class called 'jfp3ef'. I was able to get the a tags associated with the search results with the following

soup = soup.find_all("div", {"class": "jfp3ef"})
for div in soup:
    print(div.select("a"))

If you want you can download the entire page with the divs in the r class by using urllib.request, but Google blocks this behavior so you have to change the header information.

SEARCHVAR = sys.argv[1:]
query = 'http://google.com/search?q=' + ' '.join(SEARCHVAR)
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 
(KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(query, headers= headers)
html = urllib.request.urlopen(req).read()
print('Searching ' + ' '.join(SEARCHVAR[:]) + ' on Google')
soup = bs4.BeautifulSoup(html, 'html.parser')
print('Parsing')
linkElems = soup.select('.r a') 
print(str(linkElems)

The example in the book is out of date. I assume my top example with class "jfp3ef" is randomized from google and will break soon or may not work for you at all. The bottom example does work well.

Solution 2:

Replace this:

linkElems = soup.select('div#main > div > div > div > a')

With:

linkElems = soup.select('div#main > div > div > div > a')

Python Tutorial for Beginners

Beautifulsoup Select Method Not Selecting Results As Expected

Solution 1:

Solution 2:

Post a Comment for "Beautifulsoup Select Method Not Selecting Results As Expected"