Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows
Problem I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing. I'm looking for a good (simple and efficie
Solution 1:
Use DataFrame.asfreq
working with Datetimeindex
:
prices = prices.set_index('datetime').asfreq('1Min')
print(prices)
open high low close
datetime
2019-02-07 16:00:00 124.634 124.624 124.650 124.620
2019-02-07 16:01:00 NaN NaN NaN NaN
2019-02-07 16:02:00 NaN NaN NaN NaN
2019-02-07 16:03:00 NaN NaN NaN NaN
2019-02-07 16:04:00 124.624 124.627 124.647 124.617
Solution 2:
A more manual answer would be:
from datetime import datetime, timedelta
from dateutil import parser
import pandas as pd
df = pd.DataFrame({
'a': ['2021-02-07 11:00:30', '2021-02-07 11:00:31', '2021-02-07 11:00:35'],
'b': [64.8, 64.8, 50.3]
})
max_dt = parser.parse(max(df['a']))
min_dt = parser.parse(min(df['a']))
dt_range = []
while min_dt <= max_dt:
dt_range.append(min_dt.strftime("%Y-%m-%d %H:%M:%S"))
min_dt += timedelta(seconds=1)
complete_df = pd.DataFrame({'a': dt_range})
final_df = complete_df.merge(df, how='left', on='a')
It converts the following dataframe:
a b
0 2021-02-07 11:00:30 64.8
1 2021-02-07 11:00:31 64.8
2 2021-02-07 11:00:35 50.3
to:
a b
0 2021-02-07 11:00:30 64.8
1 2021-02-07 11:00:31 64.8
2 2021-02-07 11:00:32 NaN
3 2021-02-07 11:00:33 NaN
4 2021-02-07 11:00:34 NaN
5 2021-02-07 11:00:35 50.3
which we can fill its null values later
Solution 3:
The proposal of @jezrael didnt't work for me initially because my index
used to be different type than DatetimeIndex
. The execution of prices.asfreq()
wiped out all prices
data, though it filled the gaps with Nan
that way:
open high low close
datetime
2019-02-07 16:00:00 NaN NaN NaN NaN
2019-02-07 16:01:00 NaN NaN NaN NaN
2019-02-07 16:02:00 NaN NaN NaN NaN
2019-02-07 16:03:00 NaN NaN NaN NaN
2019-02-07 16:04:00 NaN NaN NaN NaN
To fix this I had to change the type of index
column like this
prices['date'] = pd.to_datetime(prices['datetime'])
prices = prices.set_index('date')
prices.drop(['datetime'], axis=1, inplace=True)
That code will convert the type of 'datetime' column to DatetimeIndex
type, and set the new column as index
Now I can call
prices = prices.asfreq('1Min')
Post a Comment for "Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows"