Skip to content Skip to sidebar Skip to footer

Most Pythonic Way To Find The Sibling Of An Element In Xml

Problem: I have the following XML snippet: ...snip...

DEFINITION

This

Solution 1:

There are a couple of ways of doing this, but by relying on xpath to do most of the work, this expression

//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]

should work.

Using lxml:

from lxml import html

data = [your snippet above]
exp = "//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]"

tree = html.fromstring(data) 
target = tree.xpath(exp)

for i in target:
    print(i.text_content())

Output:

This, these.

Solution 2:

You can use BeatifulSoup with CSS selectors for this task. The selector .p_cat_heading:contains("DEFINITION") ~ .p_cat_heading will select all elements with class p_cat_heading that are preceded by element with class p_cat_heading containing string "DEFINITION":

data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

for heading in soup.select('.p_cat_heading:contains("DEFINITION") ~ .p_cat_heading'):
    print(heading.text)

Prints:

PRONUNCIATION 

Further reading

CSS Selector guide

EDIT:

To select direct sibling after the DEFINITION:

data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This is after DEFINITION</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p><pclass="p_numberedbullet"><spanclass="calibre10">This is after PRONUNCIATION</span>, <spanclass="calibre10">these</span>. </p>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

s = soup.select_one('.p_cat_heading:contains("DEFINITION") + :not(.p_cat_heading)')
print(s.text)

Prints:

This is after DEFINITION, these. 

Post a Comment for "Most Pythonic Way To Find The Sibling Of An Element In Xml"