Skip to content Skip to sidebar Skip to footer

Python - Extracting Sentences From Paragraphs

I am new to python & can use some help: This is just a sample : I have a dictionary (with same key values repeating inside a list: list_dummy = [{'a': 1, 'b':'The house is grea

Solution 1:

Well I can't seem to get the nltk module working to test but as long as sent_tokenize() returns a list of sentence strings something like this I think should do what you're hoping (if I understood correctly):

ans = []
fordin list_dummy:
    tmp = sent_tokenize(d['b'])
    s = [x forxin tmp ifany(w.upper() in x.upper() forwin d['e'].split(","))]
    ans += s

This assumes that e will always be a comma separated list and that you're interested in case insensitive searching. The ans variable will just be a flat list of sentences that contain a word from the 'e' value in the dictionary.

EDIT

If you prefer using regular expressions you could use the re module:

import re
ans = []
for d in list_dummy:
    b = sent_tokenize(d['b'])
    e = d['e'].split(",")
    rstring = ".*" + "|".join(e) + ".*"
    r = re.compile(rstring)
    ans.append([x for x in b if r.match(x)])

Post a Comment for "Python - Extracting Sentences From Paragraphs"