Skip to content Skip to sidebar Skip to footer

Using Beautifulsoup To Parse String Efficiently

I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW)

Solution 1:

This is because of this caveat of the .string attribute:

If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None

Since the header element contains multiple children - it cannot be defined and defaults to None.

To avoid cutting of "Details about" part, you can get the first text node in a non-recursive mode:

soup.find('h1', {'class':'it-ttl'}).find(text=True, recursive=False)

Demo:

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: print(soup.find('h1', {'class':'it-ttl'}).find(text=True, recursive=False))
Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW

Post a Comment for "Using Beautifulsoup To Parse String Efficiently"