Create Week Label From Multiple Days Columns
I have this pandas data frame: user join_date days_0 days_1 ... days_n A 10-08-2019 1 1 ... 0 B 11-08-2019 0 1 ... 1
Solution 1:
Use:
np.random.seed(123)
#sample data
df1 = pd.DataFrame({'user': list('ABZ'),
'join_date':['10-08-2019','11-08-2019','30-19-2019']})
df2 = pd.DataFrame(np.random.choice([0,1], size=(3, 20))).add_prefix('days_')
df = df1.join(df2)
print (df)
user join_date days_0 days_1 days_2 days_3 days_4 days_5 days_6 \
0 A 10-08-2019 0 1 0 0 0 0 0
1 B 11-08-2019 0 0 1 1 1 0 1
2 Z 30-19-2019 0 1 0 1 1 1 0
days_7 ... days_10 days_11 days_12 days_13 days_14 days_15 days_16 \
0 1 ... 1 1 0 1 0 1 0
1 0 ... 0 1 1 1 0 0 1
2 0 ... 1 1 0 0 1 0 1
days_17 days_18 days_19
0 1 1 0
1 0 0 1
2 0 0 1
[3 rows x 22 columns]
You can filter columns with days
by DataFrame.filter
:
print (df.filter(like='days_'))
days_0 days_1 days_2 days_3 days_4 days_5 days_6 days_7 days_8 \
001000001110011101002010111000
days_9 days_10 days_11 days_12 days_13 days_14 days_15 days_16 \
001101010100111001201100101
days_17 days_18 days_19
011010012001
Then groupby with lambda function for convert number after _
to integer and use integers division for groups by each 7 columns (// 6
because python counts from 0
) with aggregation sum
:
df3 = (df.filter(like='days_')
.groupby(lambda x: int(x.split('_')[1]) // 6, axis=1)
.sum()
.add_prefix('week_'))
print (df3)
week_0 week_1 week_2 week_3
014311323124221
Last join to original:
df = df.join(df3)
print (df)
user join_date days_0 days_1 days_2 days_3 days_4 days_5 days_6 \
0 A 10-08-2019 0 1 0 0 0 0 0
1 B 11-08-2019 0 0 1 1 1 0 1
2 Z 30-19-2019 0 1 0 1 1 1 0
days_7 ... days_14 days_15 days_16 days_17 days_18 days_19 week_0 \
0 1 ... 0 1 0 1 1 0 1
1 0 ... 0 0 1 0 0 1 3
2 0 ... 1 0 1 0 0 1 4
week_1 week_2 week_3
0 4 3 1
1 2 3 1
2 2 2 1
[3 rows x 26 columns]
Post a Comment for "Create Week Label From Multiple Days Columns"