Pandas Filter Rows By Two Column Values With Case Insenstive
Solution 1:
Option 1: convert to lowercase or to uppercase and compare
The simplest is to convert the two columns to lower (or to upper) before checking for equality:
df=df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
or
df=df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
output:
Out:
Last Known Date ConfigredValue ReferenceValue
2226-Jun-17TRUEFALSE
Option 2: Compare the lengths
In this particuler case, you can simply compare the lengths of TRUE and True, they are the same wether the string is upper or lower case:
df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
output:
Out:
Last Known Date ConfigredValue ReferenceValue
2226-Jun-17TRUEFALSE
Option 3: Vectorized title
str.title()
was also suggested in @0p3n5ourcE answer, here's a vectorized version of it:
df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
Execution time
Benchmarking the speed shows that str.len()
is a bit faster
In [35]: timeit df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
1000 loops, best of 3: 496 µs per loop
In [36]: timeit df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
1000 loops, best of 3: 496 µs per loop
In [37]: timeit df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
1000 loops, best of 3: 495 µs per loop
In [38]: timeit df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
1000 loops, best of 3: 479 µs per loop
Solution 2:
Better replace existing false with 'FALSE' with case = False
parameter ie
df['ConfigredValue'] = df['ConfigredValue'].str.replace('false','FALSE',case=False)
df=df[df['ConfigredValue']!=df['ReferenceValue']]
Output:
Last Known_Date ConfigredValue ReferenceValue 2 2 26-Jun-17 TRUE FALSE
Solution 3:
Looks like the columns hold boolean
values, if it is not a problem converting the columns to boolean
datatype then, following can work too (where .title()
is used to change first character of string to uppercase e.g. FALSE to False, or true to True which can be used to convert then to corresponding boolean value):
df['ConfigredValue'] = df['ConfigredValue'].apply(lambda row: eval(row.title()))
df['ReferenceValue'] = df['ReferenceValue'].apply(lambda row: eval(row.title()))
Then, using same comparison as above:
df[df['ConfigredValue'] != df['ReferenceValue']]
Output:
Last Known Date ConfigredValue ReferenceValue
226-Jun-17TrueFalse
Or, simply using title only similar to uppercase or lowercase:
df[df['ConfigredValue'].str.title() !=df['ReferenceValue'].str.title()]
Solution 4:
Outside the boxpandas.read_csv
reads all of these in as boolean. You can dump to csv and read it in again. Then you can use pd.DataFrame.query
pd.read_csv(pd.io.common.StringIO(df.to_csv(index=False))).query(
'ConfigredValue != ReferenceValue')
Last Known Date ConfigredValue ReferenceValue
226-Jun-17TrueFalse
Post a Comment for "Pandas Filter Rows By Two Column Values With Case Insenstive"