Check On Pandas Dataframe
I have a pandas dataframe composed by 3 columns. index start end value 0 0 37647 0 1 37648 37846 1 2 37847 42874 0 3 42875 43049
Solution 1:
This could probably be cleaned up a bit, but should work.
Code:
# FIRST CHECKdf['end'][df['end'].shift(-1) == df['start'].shift(-1)] = df['end'].shift(-1)
df.drop_duplicates('end', inplace = True)
# SECOND CHECKdf['end'][df['value'].shift(-1) == df['value']] = df['end'].shift(-1)
df['value'][df['value'].shift(-1) == df['value']] = (df['value'] + df['value'].shift(-1)).fillna(0).astype(int)
df.drop_duplicates('end', inplace = True)
Output:
startendvalue0037647013764837846123784742874034287543049144305051352055135351665-1651666552590
Solution 2:
Using numpy where
you can do it like this:
import numpy as np
inp = np.where(df.start == df.end)[0]
droplist = []
save = 0
j = 0
for i in range(len(inp)):
if inp[i] > 0:
if inp[i]-inp[i-1] == 1:
j += 1
save += 1
df.loc[inp[i]-1-j,"end"] += save
else:
j = 0
save = 0
df.loc[inp[i]-1,"end"] += 1
droplist.append(inp[i])
df = df.drop(droplist).reset_index(drop=True)
droplist = []
jnp = np.where(df.value == df.value.shift(-1))[0]
for jj in jnp:
df.loc[jj,"end"] = df.loc[jj+1,"end"]
droplist.append(jj+1)
df = df.drop(droplist).reset_index(drop=True)
There might be a more pythonic way without for-loops using numpy though.
EDIT: Fixed for consecutive rows.
Post a Comment for "Check On Pandas Dataframe"