Skip to content Skip to sidebar Skip to footer

Identify Value Across Multiple Columns In A Dataframe That Contain String From A List In Python

I have a dataframe with multiple columns containing phrases. What I would like to do is identify the column (per row observation) that contains a string that exists within a pre-

Solution 1:

Let's take the list of words and data frame as you have mentioned

lst = ['a','m','n','o','p']

df = pd.DataFrame({'Observation': [1], 'col1': ['ab'], 'col2': ['dc'], 'col3': ['ef'], 'col4': ['yz']})
df
   Observation  col1    col2    col3    col4
  0    1         ab      dc      ef      yz

Check whether values of data frame match with values in the list

df['New_var'] = [x for x in df.values[0] if any(b for b in lst if b in str(x))]
df
   Observation  col1    col2    col3    col4    New_var
  0        1     ab      dc      ef      yz       ab

Post a Comment for "Identify Value Across Multiple Columns In A Dataframe That Contain String From A List In Python"