Skip to content Skip to sidebar Skip to footer

Sum Large Pandas Dataframe Based On Smaller Date Ranges

I have a large pandas dataframe that has hourly data associated with it. I then want to parse that into 'monthly' data that sums the hourly data. However, the months aren't neces

Solution 1:

pd.merge_asof only available with pandas 0.19
combination of pd.merge_asof + query + groupby

pd.merge_asof(df, month, left_on='date', right_on='start') \
    .query('date <= end').groupby(['start', 'end']).num.sum().reset_index()

enter image description here


explanation
pd.merge_asof
From docs

For each row in the left DataFrame, we select the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key. Both DataFrames must be sorted by the key.

But this only takes into account the start date.

query
I take care of end date with query since I now conveniently have end in my dataframe after pd.merge_asof

groupby
I trust this part is obvious`


Solution 2:

Maybe you can convert to a period and add a number of days

# create data
dates = pd.Series(pd.date_range('1/1/2015 00:00','3/31/2015 23:45',freq='1H'))
nums = np.random.randint(0,100,dates.count())
df = pd.DataFrame({'date':dates, 'num':nums})

# offset days and then create period
df['periods'] = (df.date + pd.tseries.offsets.Day(23)).dt.to_period('M')]

# group and sum
df.groupby('periods')['num'].sum()

Output

periods
2015-01    10051
2015-02    34229
2015-03    37311
2015-04    26655

You can then shift the dates back and make new columns


Post a Comment for "Sum Large Pandas Dataframe Based On Smaller Date Ranges"