Skip to content Skip to sidebar Skip to footer

Count Occurences For Each Year In Pandas Dataframe Based On Subgroup

Imagine a pandasdataframe that are given by df = pd.DataFrame({ 'id': [1, 1, 1, 2, 2], 'location': [1, 2, 3, 1, 2], 'date': [pd.to_datetime('01-01-{}'.format(year)) for

Solution 1:

get_dummies

df.join(pd.get_dummies(df.date.dt.year).sum(level=0))datelocation2015  2016  2017  2018id12015-01-01         1210012016-01-01         2210012015-01-01         3210022017-01-01         1001122018-01-01         20011

factorize

i,r=pd.factorize(df.index)j,c=pd.factorize(df.date.dt.year)n,m=shape=len(r),len(c)b=np.zeros(shape,dtype=np.int64)np.add.at(b,(i,j),1)df.join(pd.DataFrame(b,r,c).rename_axis('id'))datelocation2015  2016  2017  2018id12015-01-01         1210012016-01-01         2210012015-01-01         3210022017-01-01         1001122018-01-01         20011

Solution 2:

Create helper DataFrame by groupby with size, unstack and year and join to original df:

df1=df.join(df.groupby(['id',df['date'].dt.year]).size().unstack(fill_value=0),on='id')print(df1)locationdate2015  2016  2017  2018id112015-01-01     2100122016-01-01     2100132015-01-01     2100212017-01-01     0011222018-01-01     0011

Detail:

print (df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0))

date  2015  2016  2017  2018
id                          
1        2     1     0     0
2        0     0     1     1

Another solution with crosstab:

df1 = df.join(pd.crosstab(df.index, df['date'].dt.year), on='id')

print (pd.crosstab(df.index, df['date'].dt.year))
date   2015  2016  2017  2018
row_0                        
1         2     1     0     0
2         0     0     1     1

Post a Comment for "Count Occurences For Each Year In Pandas Dataframe Based On Subgroup"