Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows
Problem I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing. I'm looking for a good (simple and efficie
Solution 1:
Use DataFrame.asfreq working with Datetimeindex:
prices = prices.set_index('datetime').asfreq('1Min')print(prices)
open high low close
datetime
2019-02-0716:00:00124.634124.624124.650124.6202019-02-0716:01:00 NaN NaN NaN NaN
2019-02-0716:02:00 NaN NaN NaN NaN
2019-02-0716:03:00 NaN NaN NaN NaN
2019-02-0716:04:00124.624124.627124.647124.617Solution 2:
A more manual answer would be:
from datetime import datetime, timedelta
from dateutil import parser
import pandas as pd
df = pd.DataFrame({
'a': ['2021-02-07 11:00:30', '2021-02-07 11:00:31', '2021-02-07 11:00:35'],
'b': [64.8, 64.8, 50.3]
})
max_dt = parser.parse(max(df['a']))
min_dt = parser.parse(min(df['a']))
dt_range = []
while min_dt <= max_dt:
dt_range.append(min_dt.strftime("%Y-%m-%d %H:%M:%S"))
min_dt += timedelta(seconds=1)
complete_df = pd.DataFrame({'a': dt_range})
final_df = complete_df.merge(df, how='left', on='a')
It converts the following dataframe:
ab02021-02-07 11:00:30 64.812021-02-07 11:00:31 64.822021-02-07 11:00:35 50.3to:
ab02021-02-07 11:00:30 64.812021-02-07 11:00:31 64.822021-02-07 11:00:32 NaN32021-02-07 11:00:33 NaN42021-02-07 11:00:34 NaN52021-02-07 11:00:35 50.3which we can fill its null values later
Solution 3:
The proposal of @jezrael didnt't work for me initially because my index used to be different type than DatetimeIndex. The execution of prices.asfreq() wiped out all prices data, though it filled the gaps with Nan that way:
open high low close
datetime
2019-02-0716:00:00NaNNaNNaNNaN2019-02-0716:01:00NaNNaNNaNNaN2019-02-0716:02:00NaNNaNNaNNaN2019-02-0716:03:00NaNNaNNaNNaN2019-02-0716:04:00NaNNaNNaNNaNTo fix this I had to change the type of index column like this
prices['date'] = pd.to_datetime(prices['datetime'])
prices = prices.set_index('date')
prices.drop(['datetime'], axis=1, inplace=True)
That code will convert the type of 'datetime' column to DatetimeIndex type, and set the new column as index
Now I can call
prices = prices.asfreq('1Min')
Post a Comment for "Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows"