ml4gw.dataloading.in_memory_dataset
Classes
|
Dataset for iterating through in-memory multi-channel timeseries |
- class ml4gw.dataloading.in_memory_dataset.InMemoryDataset(X, kernel_size, y=None, batch_size=32, stride=1, batches_per_epoch=None, coincident=True, shuffle=True, device='cpu')
Bases:
IterableDataset
Dataset for iterating through in-memory multi-channel timeseries
Dataset for arrays of timeseries data which can be stored in-memory all at once. Iterates through the data by sampling fixed-length windows from all channels. The precise mechanism for this iteration is determined by combinations of the keyword arguments. See their descriptions for details.
- Parameters:
X (
Float[Tensor, 'channels time']
) -- Timeseries data to be iterated through. Should have shape(num_channels, length * sample_rate)
. Windows will be sampled from the time (1st) dimension for all channels along the channel (0th) dimension.kernel_size (
int
) -- The length of the windows to sample fromX
in units of samples.y (
Optional
[Float[Tensor, 'time']
]) -- Target timeseries to be iterated through. If specified, should be a single channel and have shape(length * sample_rate,)
. If left asNone
, only windows sampled fromX
will be returned during iteration. Otherwise, windows sampled from both arrays will be returned. Note that if sampling is performed non-coincidentally, there's no sensible way to align windows sampled from this array with the windows sampled fromX
, so this combination of arguments is not permitted.batch_size (
int
) -- Maximum number of windows to return at each iteration. Will be the length of the 0th dimension of the returned array(s). Ifbatches_per_epoch
is specified, this will be the length of every array returned during iteration. Otherwise, it's possible that the last array will be shorter due to the number of windows in the timeseries being a non-integer multiple ofbatch_size
.stride (
int
) -- The resolution at which windows will be sampled from the specified timeseries, in units of samples. E.g. ifstride=2
, the first sample of each window can only be from an index ofX
which is a multiple of 2. Obviously, this reduces the number of windows which can be iterated through by a factor ofstride
.batches_per_epoch (
Optional
[int
]) -- Number of batches of window to produce during iteration before raising aStopIteration
. Must be specified if performing non-coincident sampling. Otherwise, if left asNone
, windows will be sampled until the entire timeseries has been exhausted. Note thatbatch_size * batches_per_epoch
must be be small enough to be able to be fulfilled by the number of windows in the timeseries, otherise aValueError
will be raised.coincident (
bool
) -- Whether to sample windows from the channels ofX
using the same indices or independently. Can't beTrue
ifbatches_per_epoch
isNone
ory
is notNone
.shuffle (
bool
) -- Whether to sample windows from timeseries randomly or in order along the time axis. Ifcoincident=False
andshuffle=False
, channels will be iterated through with the index along the last channel moving fastest.device (
str
) -- Which device to host the timeseries arrays on
- init_indices()
Initialize arrays of indices we'll use to slice through X and y at iteration time. This helps by taking care of building in any randomness upfront.
- property num_kernels: int
The number of windows contained in the timeseries if we sample at the specified stride.