Most Pythonic Way To Find The Sibling Of An Element In Xml
Problem: I have the following XML snippet: ...snip...
DEFINITION
This
Solution 1:
There are a couple of ways of doing this, but by relying on xpath to do most of the work, this expression
//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]
should work.
Using lxml:
from lxml import html
data = [your snippet above]
exp = "//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]"
tree = html.fromstring(data)
target = tree.xpath(exp)
for i in target:
print(i.text_content())
Output:
This, these.
Solution 2:
You can use BeatifulSoup with CSS selectors for this task. The selector .p_cat_heading:contains("DEFINITION") ~ .p_cat_heading
will select all elements with class p_cat_heading
that are preceded by element with class p_cat_heading
containing string "DEFINITION":
data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
for heading in soup.select('.p_cat_heading:contains("DEFINITION") ~ .p_cat_heading'):
print(heading.text)
Prints:
PRONUNCIATION
Further reading
EDIT:
To select direct sibling after the DEFINITION:
data = '''
<pclass="p_cat_heading">THIS YOU DONT WANT</p><pclass="p_numberedbullet"><spanclass="calibre10">This</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">DEFINITION</p><pclass="p_numberedbullet"><spanclass="calibre10">This is after DEFINITION</span>, <spanclass="calibre10">these</span>. </p><pclass="p_cat_heading">PRONUNCIATION </p><pclass="p_numberedbullet"><spanclass="calibre10">This is after PRONUNCIATION</span>, <spanclass="calibre10">these</span>. </p>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
s = soup.select_one('.p_cat_heading:contains("DEFINITION") + :not(.p_cat_heading)')
print(s.text)
Prints:
This is after DEFINITION, these.
Post a Comment for "Most Pythonic Way To Find The Sibling Of An Element In Xml"