Simple Regex For Simple Xml String
I have a string consisting of elements. Each element can contain 'pear' or 'apple'. I can get all the elements using: s = 'uTSqUYRR8gapple K9VGTZM3
Solution 1:
Use a parser, ie BeautifulSoup
instead:
import re
from bs4 import BeautifulSoup
s = '<tag>uTSqUYRR8gapple</tag><tag>K9VGTZM3h8</tag><tag>pearTYysnMXMUc</tag><tag>udv5NZQdpzpearz5a4oS85mD</tag>'
soup = BeautifulSoup(s, "html5lib")
tags = soup.find_all(text=re.compile(r'pear'))
print tags
# [u'pearTYysnMXMUc', u'udv5NZQdpzpearz5a4oS85mD']
This sets up the dom and finds all tags where your text matches the regex pear
(looking for "pear" literally).
See a demo on ideone.com.
Solution 2:
Using a proper XML library will let you use XPath to encapsulate your query. For instance:
s = '<root><tag>uTSqUYRR8gapple</tag><tag>K9VGTZM3h8</tag><tag>pearTYysnMXMUc</tag><tag>udv5NZQdpzpearz5a4oS85mD</tag></root>'
import lxml.etree
root = lxml.etree.fromstring(s)
result = root.xpath('//tag[contains(., "pear")][last()]/text()')
...for which result
will contain, for the input data given, ['udv5NZQdpzpearz5a4oS85mD']
. In this case you don't need to do the search for the last item in your own code, but can rely on the XPath engine (implemented in C, as part of libxml) to do that for you.
Post a Comment for "Simple Regex For Simple Xml String"