Count Occurences For Each Year In Pandas Dataframe Based On Subgroup
Imagine a pandasdataframe that are given by df = pd.DataFrame({ 'id': [1, 1, 1, 2, 2], 'location': [1, 2, 3, 1, 2], 'date': [pd.to_datetime('01-01-{}'.format(year)) for
Solution 1:
get_dummies
df.join(pd.get_dummies(df.date.dt.year).sum(level=0))datelocation2015 2016 2017 2018id12015-01-01 1210012016-01-01 2210012015-01-01 3210022017-01-01 1001122018-01-01 20011
factorize
i,r=pd.factorize(df.index)j,c=pd.factorize(df.date.dt.year)n,m=shape=len(r),len(c)b=np.zeros(shape,dtype=np.int64)np.add.at(b,(i,j),1)df.join(pd.DataFrame(b,r,c).rename_axis('id'))datelocation2015 2016 2017 2018id12015-01-01 1210012016-01-01 2210012015-01-01 3210022017-01-01 1001122018-01-01 20011
Solution 2:
Create helper DataFrame
by groupby
with size
, unstack
and year
and join
to original df
:
df1=df.join(df.groupby(['id',df['date'].dt.year]).size().unstack(fill_value=0),on='id')print(df1)locationdate2015 2016 2017 2018id112015-01-01 2100122016-01-01 2100132015-01-01 2100212017-01-01 0011222018-01-01 0011
Detail:
print (df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0))
date 2015 2016 2017 2018
id
1 2 1 0 0
2 0 0 1 1
Another solution with crosstab
:
df1 = df.join(pd.crosstab(df.index, df['date'].dt.year), on='id')
print (pd.crosstab(df.index, df['date'].dt.year))
date 2015 2016 2017 2018
row_0
1 2 1 0 0
2 0 0 1 1
Post a Comment for "Count Occurences For Each Year In Pandas Dataframe Based On Subgroup"