How To Separate Time Ranges/intervals Into Bins If Intervals Occur Over Multiple Bins
I have a dataset which consists of pairs of start-end times (say seconds) of something happening across a recorded period of time. For example: #each tuple includes (start, stop) o
Solution 1:
If you don't mind using numpy
, here is a strategy:
import numpy as np
def bin_times(data, bin_size, total_length):
times = np.zeros(total_length, dtype=np.bool)
for start, stop indata:
times[start:stop] = True
binned = 100 * np.average(times.reshape(-1, bin_size), axis=1)
return binned.tolist()
data = [(0, 1), (5,8), (15,21), (29,30)]
bin_times(data, 5, 40)
// => [20.0, 60.0, 0.0, 100.0, 20.0, 20.0, 0.0, 0.0]
To explain the logic of bin_times()
, let me use a smaller example:
data = [(0, 1), (3, 8)]
bin_times(data, 3, 9)
// => [33.3, 100.0, 66.6]
The
times
array encodes whether your event is happening in each unit time interval. You start by setting every entry toFalse
:[False, False, False, False, False, False, False, False, False]
Read the incoming
data
and turn the appropriate entries toTrue
:[True, False, False, True, True, True, True, True, False]
Reshape it into a two-dimensional matrix in which the length of the rows is
bin_size
:[[True, False, False], [True, True, True], [True, True, False]]
Take the average in each row:
[0.333, 1.000, 0.666]
Multiply by 100 to turn those numbers into percentages:
[33.3, 100.0, 66.6]
To hide the use of
numpy
from the consumer of the function, use the.tolist()
method to turn the resultingnumpy
array into a plain Python list.
One caveat: bin_size
needs to evenly divide total_length
— the reshaping will throw a ValueError
otherwise.
Post a Comment for "How To Separate Time Ranges/intervals Into Bins If Intervals Occur Over Multiple Bins"