Create Id Column In Dataframe Based On Other Column Values / Pandas -python
I have a dataframe like this L_1 D_1 L_2 D_2 L_3 D_3 C_N 1 Boy Boy|| 1 Boy 1-1 play Boy|play| 1 Bo
Solution 1:
I have defined a custom function to retrieve the required data:
df = pd.DataFrame([
['1', 'Boy','','','',''],
['1', 'Boy','1-1','play','',''],
['1', 'Boy','1-1','play','1-1-21','car'],
['1', 'Boy','1-1','play','1-1-1','online'],
['2', 'Girl','','','',''],
['2', 'Girl','','dance','','']], columns=['L_1','D_1','L_2','D_2','L_3','D_3']
)
df['C_N'] = df[['D_1','D_2', 'D_3']].apply(lambda x: '|'.join(x), axis=1)
def get_data(x,y,z):
result= []
if x !='':
result.append(x)
if y !='':
result.append(y)
if z !='':
result.append(z)
returnresult[-1]
df['IDs'] =''
df['IDs'] = df.apply(lambda row: get_data(row['L_1'], row['L_2'], row['L_3']), axis=1)
Output df
Solution 2:
df = df.replace("^\s*$", np.nan, regex=True)
id_inds = df.filter(like="L_").agg(pd.Series.last_valid_index, axis=1)
# either this (but deprecated..)df["IDs"] = df.lookup(df.index, id_inds)
# or thisdf["IDs"] = df.to_numpy()[np.arange(len(df)), df.columns.get_indexer(id_inds)]
First we replace empty cells with NaN
and then look at the L_*
columns. Getting their last_valid_index
es which gives column names. Then we can either lookup
(deprecated), or go to numpy values and do fancy indexing with get_indexer
,
to get
>>> df
L_1 D_1 L_2 D_2 L_3 D_3 C_N IDs
01 Boy NaNNaNNaNNaN Boy||111 Boy 1-1 play NaNNaN Boy|play|1-121 Boy 1-1 play 1-1-21 car Boy|play|car 1-1-2131 Boy 1-1 play 1-1-1 online Boy|play|online 1-1-142 Girl NaNNaNNaNNaN Girl||252 Girl 2-1 dance NaNNaN Girl|dance|2-1
You can now replace the NaN
s back with empty string, if you wish.
Post a Comment for "Create Id Column In Dataframe Based On Other Column Values / Pandas -python"