How To Count The Number Of State Change In Pandas?
i have below dataframe that have columns 0-1 .. and i wanna count the number of 0->1,1->0 every column. in below dataframe 'a' column state change number is 6, 'b' state ch
Solution 1:
Use rolling
and compare each value, then count all True
values by sum
:
df = df[['a','b','c']].rolling(2).apply(lambda x: x[0] != x[-1], raw=True).sum().astype(int)
a 6
b 3
c 2
dtype: int64
Solution 2:
Bit wise xor
(^
)
Use the Numpy array df.values
and compare the shifted elements with ^
This is meant to be a fast solution.
Xor has the property that only one of the two items being operated on can be true as shown in this truth table
A B XOR
TTFTFTFTTFFF
And replicated in 0
/1
form
a = np.array([1, 1, 0, 0])
b = np.array([1, 0, 1, 0])
pd.DataFrame(dict(A=a, B=b, XOR=a ^ b))
A B XOR0110110120113000
Demo
v = df.values
pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)
a 6
b 3
c 2
dtype: int64
Time Testing
Functions
defpir_xor(df):
v = df.values
return pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)
defpir_diff1(df):
v = df.values
return pd.Series(np.abs(np.diff(v, axis=0)).sum(0), df.columns)
defpir_diff2(df):
v = df.values
return pd.Series(np.diff(v.astype(np.bool), axis=0).sum(0), df.columns)
defcold(df):
return df.ne(df.shift(-1)).sum(0) - 1defjez(df):
return df.rolling(2).apply(lambda x: x[0] != x[-1]).sum().astype(int)
defnaga(df):
return df.diff().abs().sum().astype(int)
Testing
np.random.seed([3, 1415])
idx = [10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000]
col = 'pir_xor pir_diff1 pir_diff2 cold jez naga'.split()
res = pd.DataFrame(np.nan, idx, col)
for i in idx:
df = pd.DataFrame(np.random.choice([0, 1], size=(i, 3)), columns=[*'abc'])
for j in col:
stmt = f"{j}(df)"
setp = f"from __main__ import {j}, df"
res.at[i, j] = timeit(stmt, setp, number=100)
Results
res.div(res.min(1),0)pir_xorpir_diff1pir_diff2coldjeznaga101.062031.1197691.00000021.21755516.7685326.601518301.000001.0754061.11574323.22901318.8440257.2123691001.000001.1340821.17497322.67328921.4780687.5198983001.000001.1191531.16678221.72549526.2937127.2154901000 1.000001.1062671.16778618.39446237.9251606.2842533000 1.000001.1185541.34219216.05309764.9533105.594610100001.000001.1635571.51163112.008129106.4666364.503359300001.000001.2498351.4311207.826387118.3802273.6214551000001.000001.2752721.5288406.690012131.9123493.1501553000001.000001.2793731.5282386.301007140.6674273.190868
res.plot(loglog=True, figsize=(15, 8))
Solution 3:
shift
and compare:
df.ne(df.shift(-1)).sum(0) - 1a6b3
c 2
dtype: int64
...Assuming "number" is the index, otherwise precede your solution with
df.set_index('number', inplace=True)
.
Solution 4:
You can try of taking difference with previous one and add absolute valeues
df.diff().abs().sum().astype(int)
Out:
162332
dtype: int32
Solution 5:
Use:
def agg_columns(x):
shifted = x.shift()
returnsum(x[1:] != shifted[1:])
df[['a','b','c']].apply(agg_columns)
a 6
b 3
c 2
dtype: int64
Post a Comment for "How To Count The Number Of State Change In Pandas?"