How To Scrape The Javascript Based Site Https://marketchameleon.com/calendar/earnings Using Selenium And Python?
Solution 1:
I took your code added a few tweaks and ran a test to extract the earning dates from https://marketchameleon.com/Calendar/Earnings as follows:
Code Block:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe') driver.get('https://marketchameleon.com/Calendar/Earnings') print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.dateselect_menu_h_table tr > th > span"))).text) print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='dateselect_menu_h_table']//tr/th/span"))).get_attribute("innerHTML"))
Observation
Similar to your observation, I have hit the same roadblock that using Selenium the earnings table doesn't loads:
Deep Dive
While inspecting the DOM Tree of the webpage I found that some of the <script>
and other tags refers to the keyword akam. As an example:
!function(){if(BOOMR=a.BOOMR||{},BOOMR.plugins=BOOMR.plugins||{},!BOOMR.plugins.AK){var e=""=="true"?1:0,t="",n="gertvyrrfrzvsxxfd3ta-f-81b1f5d51-clientnsv4-s.akamaihd.net"
<script type="text/javascript" src="https://marketchameleon.com/akam/11/4e7414cb" defer=""></script>
<noscript><img src="https://marketchameleon.com/akam/11/pixel_4e7414cb?a=dD03OTIxZTlmM2QwMWVhMDkxODhjNzQwN2E3NmFkNzRiMDQ5ODBkOGU0JmpzPW9mZg==" style="visibility: hidden; position: absolute; left: -999px; top: -999px;" /></noscript>
<link id="dnsprefetchlink" href="//gertvyrrfrzvsxxfd3ta-f-81b1f5d51-clientnsv4-s.akamaihd.net">
Which is a clear indication that the website is protected by Bot Manager an advanced bot detection service provided by Akamai and the response gets blocked.
Bot Manager
As per the article Bot Manager - Foundations:
Conclusion
So it can be concluded that the request for the data is detected as being performed by Selenium driven WebDriver instance and the response is blocked.
References
A couple of documentations:
tl; dr
A couple of relevant discussions:
Post a Comment for "How To Scrape The Javascript Based Site Https://marketchameleon.com/calendar/earnings Using Selenium And Python?"