While Web Scraping For A Table In Python, An Empty Table Is Returned
Solution 1:
Very simple - it's because there's an extra space in the class you're searching for.
If you change the class to g-summary-table svelte-2wimac
, the tags should be correctly returned.
The following code should work:
import requests
from bs4 import BeautifulSoup
#
url = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', class_="g-summary-table svelte-2wimac")
print(table)
I've also done similar scraping on the NYTimes interactive website, and spaces can be very tricky. If you added an extra space or missed one, an empty result is returned.
If you cannot find the tags, I would recommend printing the entire document first using print(soup.prettify())
and find the desired tags you plan to scrape. Make sure you copy the exact text of the class name from the contents printed by BeautifulSoup.
Solution 2:
As an alternative, if you want to download the data in json format, then read into pandas, you can do this. same starting code from above and working off the soup object
There are several apis that are available (below are three), but pulled out of the html like:
import re
import pandas as pd
latest_dataset = soup.find(string=re.compile('latest')).splitlines()[2].split('"')[1]
requests.get(latest_dataset).json()
latest_timeseries = soup.find(string=re.compile('timeseries')).splitlines()[2].split('"')[3]
requests.get(latest_timeseries).json()
allwithrate = soup.find(string=re.compile('all_with_rate')).splitlines()[2].split('"')[1]
requests.get(allwithrate).json()
pd.DataFrame(requests.get(allwithrate).json())
output of the last one
geoidlocationlast_updatedtotal_vaccinationspeople_vaccinateddisplay_name...RegionIncomeGroupcountrygdp_per_capvaccinations_ratepeople_fully_vaccinated0MUSMauritius2021-02-17 3843.0 3843.0 Mauritius...Sub-SaharanAfricaHighincomeMauritius11099.240280.3037NaN1DZAAlgeria2021-02-19 75000.0NaNAlgeria...MiddleEast&NorthAfricaLowermiddleincomeAlgeria3973.964072 0.1776NaN2LAOLaos2021-03-17 40732.040732.0Laos...EastAsia&PacificLowermiddleincomeLaoPDR2534.89828 0.5768NaN3MOZMozambique2021-03-23 57305.057305.0Mozambique...Sub-SaharanAfricaLowincomeMozambique503.57077270.1943NaN4CPVCapeVerde2021-03-24 2184.0 2184.0 CapeVerde...Sub-SaharanAfricaLowermiddleincomeCaboVerde3603.781793 0.4016NaN.........................................243GUFNaNNaNNaNNaNFrenchGuiana...NaNNaNNaNNaNNaNNaN244KOSNaNNaNNaNNaNKosovo...NaNNaNNaNNaNNaNNaN245CUWNaNNaNNaNNaNCura�ao...LatinAmerica&CaribbeanHighincomeCuracao19689.13982NaNNaN246CHINaNNaNNaNNaNChannelIslands...Europe&CentralAsiaHighincomeChannelIslands74462.64675NaNNaN247SXMNaNNaNNaNNaNSintMaarten...LatinAmerica&CaribbeanHighincomeSintMaarten(Dutchpart)29160.10381NaNNaN
[248rowsx17columns]
Post a Comment for "While Web Scraping For A Table In Python, An Empty Table Is Returned"