Skip to content Skip to sidebar Skip to footer

How To Flag Last Duplicate Element In A Pandas DataFrame

As you know there is the method .duplicated to find duplicates in a column but what I need is the last duplicated element knowing that my data is ordered by Date. Here is the expec

Solution 1:

Use Series.duplicated or DataFrame.duplicated with specify column and parameter keep='last' and then convert inverted mask to integer for True/False to 1/0 mapping or use numpy.where:

df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)

Or:

df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)

print (df)
   Id Policy_id  Start_Date  Last_dup  Last_dup1
0   0      b123  2019/02/24         0          0
1   1      b123  2019/03/24         0          0
2   2      b123  2019/04/24         1          1
3   3      c123  2018/09/01         0          0
4   4      c123  2018/10/01         1          1
5   5      d123  2017/02/24         0          0
6   6      d123  2017/03/24         1          1

Solution 2:

Can be done in below-mentioned way also (without using Series.duplicated) :

dictionary = df[['Id','Policy_id']].set_index('Policy_id').to_dict()['Id']
#here the dictionary values contains the most recent Id's
df['Last_dup'] = df.Id.apply(lambda x: 1 if x in list(dictionary.values()) else 0)

Post a Comment for "How To Flag Last Duplicate Element In A Pandas DataFrame"