Skip to content Skip to sidebar Skip to footer

Create Week Label From Multiple Days Columns

I have this pandas data frame: user join_date days_0 days_1 ... days_n A 10-08-2019 1 1 ... 0 B 11-08-2019 0 1 ... 1

Solution 1:

Use:

np.random.seed(123)

#sample data
df1 = pd.DataFrame({'user': list('ABZ'),
                   'join_date':['10-08-2019','11-08-2019','30-19-2019']})  


df2 = pd.DataFrame(np.random.choice([0,1], size=(3, 20))).add_prefix('days_')  

df = df1.join(df2)
print (df)
  user   join_date  days_0  days_1  days_2  days_3  days_4  days_5  days_6  \
0    A  10-08-2019       0       1       0       0       0       0       0   
1    B  11-08-2019       0       0       1       1       1       0       1   
2    Z  30-19-2019       0       1       0       1       1       1       0   

   days_7  ...  days_10  days_11  days_12  days_13  days_14  days_15  days_16  \
0       1  ...        1        1        0        1        0        1        0   
1       0  ...        0        1        1        1        0        0        1   
2       0  ...        1        1        0        0        1        0        1   

   days_17  days_18  days_19  
0        1        1        0  
1        0        0        1  
2        0        0        1  

[3 rows x 22 columns]

You can filter columns with days by DataFrame.filter:

print (df.filter(like='days_'))
   days_0  days_1  days_2  days_3  days_4  days_5  days_6  days_7  days_8  \
001000001110011101002010111000   

   days_9  days_10  days_11  days_12  days_13  days_14  days_15  days_16  \
001101010100111001201100101   

   days_17  days_18  days_19  
011010012001

Then groupby with lambda function for convert number after _ to integer and use integers division for groups by each 7 columns (// 6 because python counts from 0) with aggregation sum:

df3 = (df.filter(like='days_')
         .groupby(lambda x: int(x.split('_')[1]) // 6, axis=1)
         .sum()
         .add_prefix('week_'))  
print (df3)
   week_0  week_1  week_2  week_3
014311323124221

Last join to original:

df = df.join(df3)
print (df)
  user   join_date  days_0  days_1  days_2  days_3  days_4  days_5  days_6  \
0    A  10-08-2019       0       1       0       0       0       0       0   
1    B  11-08-2019       0       0       1       1       1       0       1   
2    Z  30-19-2019       0       1       0       1       1       1       0   

   days_7  ...  days_14  days_15  days_16  days_17  days_18  days_19  week_0  \
0       1  ...        0        1        0        1        1        0       1   
1       0  ...        0        0        1        0        0        1       3   
2       0  ...        1        0        1        0        0        1       4   

   week_1  week_2  week_3  
0       4       3       1  
1       2       3       1  
2       2       2       1  

[3 rows x 26 columns]

Post a Comment for "Create Week Label From Multiple Days Columns"