Skip to content Skip to sidebar Skip to footer

How To Count The Number Of State Change In Pandas?

i have below dataframe that have columns 0-1 .. and i wanna count the number of 0->1,1->0 every column. in below dataframe 'a' column state change number is 6, 'b' state ch

Solution 1:

Use rolling and compare each value, then count all True values by sum:

df = df[['a','b','c']].rolling(2).apply(lambda x: x[0] != x[-1], raw=True).sum().astype(int)
a    6
b    3
c    2
dtype: int64

Solution 2:

Bit wise xor (^)

Use the Numpy array df.values and compare the shifted elements with ^ This is meant to be a fast solution.

Xor has the property that only one of the two items being operated on can be true as shown in this truth table

A B XOR
TTFTFTFTTFFF

And replicated in 0/1 form

a = np.array([1, 1, 0, 0])
b = np.array([1, 0, 1, 0])

pd.DataFrame(dict(A=a, B=b, XOR=a ^ b))

   A  B  XOR0110110120113000

Demo

v = df.values

pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)

a    6
b    3
c    2
dtype: int64

Time Testing

Open in ColabOpen in GitHub

Functions

defpir_xor(df):
  v = df.values
  return pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)

defpir_diff1(df):
  v = df.values
  return pd.Series(np.abs(np.diff(v, axis=0)).sum(0), df.columns)

defpir_diff2(df):
  v = df.values
  return pd.Series(np.diff(v.astype(np.bool), axis=0).sum(0), df.columns)

defcold(df):
  return df.ne(df.shift(-1)).sum(0) - 1defjez(df):
  return df.rolling(2).apply(lambda x: x[0] != x[-1]).sum().astype(int)

defnaga(df):
  return df.diff().abs().sum().astype(int)

Testing

np.random.seed([3, 1415])

idx = [10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000]
col = 'pir_xor pir_diff1 pir_diff2 cold jez naga'.split()
res = pd.DataFrame(np.nan, idx, col)

for i in idx:
  df = pd.DataFrame(np.random.choice([0, 1], size=(i, 3)), columns=[*'abc'])
  for j in col:
    stmt = f"{j}(df)"
    setp = f"from __main__ import {j}, df"
    res.at[i, j] = timeit(stmt, setp, number=100)

Results

res.div(res.min(1),0)pir_xorpir_diff1pir_diff2coldjeznaga101.062031.1197691.00000021.21755516.7685326.601518301.000001.0754061.11574323.22901318.8440257.2123691001.000001.1340821.17497322.67328921.4780687.5198983001.000001.1191531.16678221.72549526.2937127.2154901000    1.000001.1062671.16778618.39446237.9251606.2842533000    1.000001.1185541.34219216.05309764.9533105.594610100001.000001.1635571.51163112.008129106.4666364.503359300001.000001.2498351.4311207.826387118.3802273.6214551000001.000001.2752721.5288406.690012131.9123493.1501553000001.000001.2793731.5282386.301007140.6674273.190868

res.plot(loglog=True, figsize=(15, 8))

enter image description here

Solution 3:

shift and compare:

df.ne(df.shift(-1)).sum(0) - 1a6b3
c    2
dtype: int64

...Assuming "number" is the index, otherwise precede your solution with df.set_index('number', inplace=True).

Solution 4:

You can try of taking difference with previous one and add absolute valeues

df.diff().abs().sum().astype(int)

Out:

162332
dtype: int32

Solution 5:

Use:

def agg_columns(x):
    shifted = x.shift()
    returnsum(x[1:] != shifted[1:])

df[['a','b','c']].apply(agg_columns)

a    6
b    3
c    2
dtype: int64

Post a Comment for "How To Count The Number Of State Change In Pandas?"