Skip to content Skip to sidebar Skip to footer

Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows

Problem I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing. I'm looking for a good (simple and efficie

Solution 1:

Use DataFrame.asfreq working with Datetimeindex:

prices = prices.set_index('datetime').asfreq('1Min')
print(prices)
                        open     high      low    close
datetime                                               
2019-02-07 16:00:00  124.634  124.624  124.650  124.620
2019-02-07 16:01:00      NaN      NaN      NaN      NaN
2019-02-07 16:02:00      NaN      NaN      NaN      NaN
2019-02-07 16:03:00      NaN      NaN      NaN      NaN
2019-02-07 16:04:00  124.624  124.627  124.647  124.617

Solution 2:

A more manual answer would be:

from datetime import datetime, timedelta
from dateutil import parser

import pandas as pd



df = pd.DataFrame({
 'a': ['2021-02-07 11:00:30', '2021-02-07 11:00:31', '2021-02-07 11:00:35'],
 'b': [64.8, 64.8, 50.3]
})

max_dt = parser.parse(max(df['a']))
min_dt = parser.parse(min(df['a']))


dt_range = []
while min_dt <= max_dt:
  dt_range.append(min_dt.strftime("%Y-%m-%d %H:%M:%S"))
  min_dt += timedelta(seconds=1)


complete_df = pd.DataFrame({'a': dt_range})
final_df = complete_df.merge(df, how='left', on='a')

It converts the following dataframe:

                     a     b
0  2021-02-07 11:00:30  64.8
1  2021-02-07 11:00:31  64.8
2  2021-02-07 11:00:35  50.3

to:

                     a     b
0  2021-02-07 11:00:30  64.8
1  2021-02-07 11:00:31  64.8
2  2021-02-07 11:00:32   NaN
3  2021-02-07 11:00:33   NaN
4  2021-02-07 11:00:34   NaN
5  2021-02-07 11:00:35  50.3

which we can fill its null values later


Solution 3:

The proposal of @jezrael didnt't work for me initially because my index used to be different type than DatetimeIndex. The execution of prices.asfreq() wiped out all prices data, though it filled the gaps with Nan that way:

                         open     high      low    close
datetime                                               
2019-02-07 16:00:00      NaN      NaN      NaN      NaN
2019-02-07 16:01:00      NaN      NaN      NaN      NaN
2019-02-07 16:02:00      NaN      NaN      NaN      NaN
2019-02-07 16:03:00      NaN      NaN      NaN      NaN
2019-02-07 16:04:00      NaN      NaN      NaN      NaN

To fix this I had to change the type of index column like this

prices['date'] = pd.to_datetime(prices['datetime'])
prices = prices.set_index('date')
prices.drop(['datetime'], axis=1, inplace=True)

That code will convert the type of 'datetime' column to DatetimeIndex type, and set the new column as index

Now I can call

prices = prices.asfreq('1Min')

Post a Comment for "Find Gaps In Pandas Time Series Dataframe Sampled At 1 Minute Intervals And Fill The Gaps With New Rows"