Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows
Problem I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing. I'm looking for a good (simple and efficie
Solution 1:
Use DataFrame.asfreq
working with Datetimeindex
:
prices = prices.set_index('datetime').asfreq('1Min')print(prices)
open high low close
datetime
2019-02-0716:00:00124.634124.624124.650124.6202019-02-0716:01:00 NaN NaN NaN NaN
2019-02-0716:02:00 NaN NaN NaN NaN
2019-02-0716:03:00 NaN NaN NaN NaN
2019-02-0716:04:00124.624124.627124.647124.617
Solution 2:
A more manual answer would be:
from datetime import datetime, timedelta
from dateutil import parser
import pandas as pd
df = pd.DataFrame({
'a': ['2021-02-07 11:00:30', '2021-02-07 11:00:31', '2021-02-07 11:00:35'],
'b': [64.8, 64.8, 50.3]
})
max_dt = parser.parse(max(df['a']))
min_dt = parser.parse(min(df['a']))
dt_range = []
while min_dt <= max_dt:
dt_range.append(min_dt.strftime("%Y-%m-%d %H:%M:%S"))
min_dt += timedelta(seconds=1)
complete_df = pd.DataFrame({'a': dt_range})
final_df = complete_df.merge(df, how='left', on='a')
It converts the following dataframe:
ab02021-02-07 11:00:30 64.812021-02-07 11:00:31 64.822021-02-07 11:00:35 50.3
to:
ab02021-02-07 11:00:30 64.812021-02-07 11:00:31 64.822021-02-07 11:00:32 NaN32021-02-07 11:00:33 NaN42021-02-07 11:00:34 NaN52021-02-07 11:00:35 50.3
which we can fill its null values later
Solution 3:
The proposal of @jezrael didnt't work for me initially because my index
used to be different type than DatetimeIndex
. The execution of prices.asfreq()
wiped out all prices
data, though it filled the gaps with Nan
that way:
open high low close
datetime
2019-02-0716:00:00NaNNaNNaNNaN2019-02-0716:01:00NaNNaNNaNNaN2019-02-0716:02:00NaNNaNNaNNaN2019-02-0716:03:00NaNNaNNaNNaN2019-02-0716:04:00NaNNaNNaNNaN
To fix this I had to change the type of index
column like this
prices['date'] = pd.to_datetime(prices['datetime'])
prices = prices.set_index('date')
prices.drop(['datetime'], axis=1, inplace=True)
That code will convert the type of 'datetime' column to DatetimeIndex
type, and set the new column as index
Now I can call
prices = prices.asfreq('1Min')
Post a Comment for "Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows"