Skip to content Skip to sidebar Skip to footer

Unable To Get Actual Markup From A Page With Beautifulsoup

I am trying to scrape this URL with combination of BeautifulSoup and Selinium http://starwood.ugc.bazaarvoice.com/3523si-en_us/115/reviews.djs?format=embeddedhtml&page=2&sc

Solution 1:

First of all, you are giving us the wrong link, instead of the actual page you are trying to scrape, you give us a link to the participating in the page load js file which would be a unnecessary challenge to parse.

Secondly, you don't need BeautifulSoup in this case, selenium itself is good at locating elements and extracting the text or attributes. No need for an extra step here.

Here's a working example using the actual page with reviews you want to get:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()  # or webdriver.Firefox()
driver.get('http://www.starwoodhotels.com/sheraton/property/reviews/index.html?propertyID=115&language=en_US')

# wait for the reviews to load
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "span.BVRRReviewText")))

# get reviewsfor review_div in driver.find_elements_by_css_selector("span.BVRRReviewText"):
    print(review_div.text)
    print("---")

driver.close()

Prints:

Thisisnotalowbudgethotel.Yetthehoteloffersnoamenities.NothingandnoWiFi.Infact,youblockthewifithatcomeswithmycelluarplan.Iamapartof2groupsthatareloyaltotheSheraton,AlabamaA&Mandthe9thEpiscopalDistrictAMEChurchbuttheSheratonisnotloyaltous.---Weareacompanythathad(5)guestroomsatthehotel.Despitehavingacreditcardonfileforroomandtaxcharges,myguestwaschargedtheentireamounttoherpersonalcreditcard.Ithastakenme(5)PHONECALLSandmyowntimeandenergytogetthisbillreversed.IguessleavingamessagewithinformationandaphonenumbernumeroustimesisIGNOREDatthishotel.Youcanguaranteethatwewillnotreturnwithourbusiness.YOumaythankKimerlinorKimberlyinyouraccountingofficeforherlackofpersonalserviceandfollowthroughforthelostbusinessinthefuture.---...

Post a Comment for "Unable To Get Actual Markup From A Page With Beautifulsoup"