Skip to content Skip to sidebar Skip to footer

How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?

So my previous, more simplified question is here - How to search for text across multiple rows in a pandas dataframe? What I want to do is basically to be able to feed a text docum

Solution 1:

There are 2 columns 'start' and 'end'.

import re

terms = [term.strip() for term in open("terms.txt").readlines()]
word = df["subtitle"].str.strip()
end = word.apply(len).cumsum() + pd.RangeIndex(len(df))
start = end.shift(fill_value=-1) + 1
text = " ".join(word)
df["match"] = False
for term in terms:
    for match in re.finditer(fr"\b{term}\b", text, re.IGNORECASE):
        idx1 = start[start == match.start()].index[0]
        idx2 = end[end == match.end()].index[0]
        df[idx1:idx2] = True

Output:

$ cat terms.txt
new jersey
hello

>>> df
   id   subtitle   start     end  duration  match
0  14        new  71.986  72.096      0.11   True
1  15     jersey  72.106  72.616      0.51   True
2  16       grew  72.696  73.006      0.31  False
3  17         up  73.007  73.147      0.14  False
4  18  believing  73.156  73.716      0.56  False

Post a Comment for "How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?"