How To Pick The Rows Which Contains All The Keywords?

February 09, 2024 Post a Comment

I have 2 csv files as below : File-1 procedure code anand database 321-87 shiva network 321-123 jana audit 321-56 kalai recruitment 321-10 in file-1, each word in a row is

Solution 1:

If df is relatively small, you could use str.contains. First, build a pattern from df.

df

           procedure     code
0     anand database   321-871      shiva network  321-1232         jana audit   321-563  kalai recruitment   321-10

p = df.procedure.str.split().str.join('.*?').str.cat(sep='|')

p
'anand.*?database|shiva.*?network|jana.*?audit|kalai.*?recruitment'

Now, pass it to str.contains on df2.procedure.

df2[df2.procedure.str.contains(p)]

   s.no                                 procedure
01             kalai has a recruitment group12  shiva is the network person in my office
34                anand is the database here
56         jana is working in the audit team

Solution 2:

Another solution than regex is flashtext, this will be faster if you have more number of keywords i.e

from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keywords_from_list(df['procedure'].str.split().sum())

df2[df2['procedure'].apply(keyword_processor.extract_keywords).str.len()>1]

    s.no                                procedure
01             kalai has a recruitment group
12  shiva is the network person in my office
34                anand is the database here
56         jana is working in the audit team

To know more about this library and its speed you can check here

Baca Juga

Python Tutorial for Beginners

How To Pick The Rows Which Contains All The Keywords?

Solution 1:

Solution 2:

Post a Comment for "How To Pick The Rows Which Contains All The Keywords?"