Skip to content Skip to sidebar Skip to footer

Numpy, Pandas: What Is The Fastest Way To Calculate Dataset Row Value Basing On Previous N Values?

I have a dataset and I want to enrich it. I need to calculate some new dataset column which is some function of previous N rows of another column. As an example, given I want to ca

Solution 1:

use rolling/moving window functions.

Sample DF:

In [46]:df=pd.DataFrame({'date':pd.date_range('2000-01-01',freq='D',periods=15),'temp':np.random.rand(15)*20})In [47]:dfOut[47]:datetemp02000-01-01  17.24661612000-01-02  18.22846822000-01-03   6.24599132000-01-04   8.89006942000-01-05   6.83728552000-01-06   1.55592462000-01-07  18.64191872000-01-08   6.30817482000-01-09  13.60120392000-01-10   6.482098102000-01-11  15.711497112000-01-12  18.690925122000-01-13   2.493110132000-01-14  17.626622142000-01-15   6.982129

Answer :

In [48]:df['higher_3avg']=df.rolling(3)['temp'].mean().diff().gt(0)In [49]:dfOut[49]:datetemphigher_3avg02000-01-01  17.246616False12000-01-02  18.228468False22000-01-03   6.245991False32000-01-04   8.890069False42000-01-05   6.837285False52000-01-06   1.555924False62000-01-07  18.641918True72000-01-08   6.308174False82000-01-09  13.601203True92000-01-10   6.482098False102000-01-11  15.711497True112000-01-12  18.690925True122000-01-13   2.493110False132000-01-14  17.626622True142000-01-15   6.982129False

Explanation:

In [50]: df.rolling(3)['temp'].mean()
Out[50]:
0           NaN
1           NaN
213.907025311.12150947.32444855.76109369.01170978.835339812.85043198.7971581011.9315991113.6281731212.2985111312.936886149.033954
Name: temp, dtype: float64

Solution 2:

for huge data, Numpy solutions are 30x faster. from Here :

def moving_average(a, n=3) :
    ret = a.cumsum()
    ret[n:]  -= ret[:-n]
    return ret[n - 1:] / n

In [419]: %timeit moving_average(df.values)38.2 µs ± 1.97 µs per loop(mean ± std. dev. of 7 runs, 10000 loops each)

In [420]: %timeit df.rolling(3).mean()
1.42 ms ± 11.5 µs per loop(mean ± std. dev. of 7 runs, 1000 loops each)

Post a Comment for "Numpy, Pandas: What Is The Fastest Way To Calculate Dataset Row Value Basing On Previous N Values?"