How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?
So my previous, more simplified question is here - How to search for text across multiple rows in a pandas dataframe? What I want to do is basically to be able to feed a text docum
Solution 1:
There are 2 columns 'start' and 'end'.
import re
terms = [term.strip() for term in open("terms.txt").readlines()]
word = df["subtitle"].str.strip()
end = word.apply(len).cumsum() + pd.RangeIndex(len(df))
start = end.shift(fill_value=-1) + 1
text = " ".join(word)
df["match"] = False
for term in terms:
for match in re.finditer(fr"\b{term}\b", text, re.IGNORECASE):
idx1 = start[start == match.start()].index[0]
idx2 = end[end == match.end()].index[0]
df[idx1:idx2] = True
Output:
$ cat terms.txt
new jersey
hello
>>> df
id subtitle start end duration match
0 14 new 71.986 72.096 0.11 True
1 15 jersey 72.106 72.616 0.51 True
2 16 grew 72.696 73.006 0.31 False
3 17 up 73.007 73.147 0.14 False
4 18 believing 73.156 73.716 0.56 False
Post a Comment for "How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?"