ml4gw.dataloading.in_memory_dataset
Classes
|
Dataset for iterating through in-memory multi-channel timeseries |
- class ml4gw.dataloading.in_memory_dataset.InMemoryDataset(X, kernel_size, y=None, batch_size=32, stride=1, batches_per_epoch=None, coincident=True, shuffle=True, device='cpu')
Bases:
IterableDatasetDataset for iterating through in-memory multi-channel timeseries
Dataset for arrays of timeseries data which can be stored in-memory all at once. Iterates through the data by sampling fixed-length windows from all channels. The precise mechanism for this iteration is determined by combinations of the keyword arguments. See their descriptions for details.
- Parameters:
X (
Float[Tensor, 'channels time']) -- Timeseries data to be iterated through. Should have shape(num_channels, length * sample_rate). Windows will be sampled from the time (1st) dimension for all channels along the channel (0th) dimension.kernel_size (
int) -- The length of the windows to sample fromXin units of samples.y (
Float[Tensor, 'time']|None) -- Target timeseries to be iterated through. If specified, should be a single channel and have shape(length * sample_rate,). If left asNone, only windows sampled fromXwill be returned during iteration. Otherwise, windows sampled from both arrays will be returned. Note that if sampling is performed non-coincidentally, there's no sensible way to align windows sampled from this array with the windows sampled fromX, so this combination of arguments is not permitted.batch_size (
int) -- Maximum number of windows to return at each iteration. Will be the length of the 0th dimension of the returned array(s). Ifbatches_per_epochis specified, this will be the length of every array returned during iteration. Otherwise, it's possible that the last array will be shorter due to the number of windows in the timeseries being a non-integer multiple ofbatch_size.stride (
int) -- The resolution at which windows will be sampled from the specified timeseries, in units of samples. E.g. ifstride=2, the first sample of each window can only be from an index ofXwhich is a multiple of 2. Obviously, this reduces the number of windows which can be iterated through by a factor ofstride.batches_per_epoch (
int|None) -- Number of batches of window to produce during iteration before raising aStopIteration. Must be specified if performing non-coincident sampling. Otherwise, if left asNone, windows will be sampled until the entire timeseries has been exhausted. Note thatbatch_size * batches_per_epochmust be be small enough to be able to be fulfilled by the number of windows in the timeseries, otherise aValueErrorwill be raised.coincident (
bool) -- Whether to sample windows from the channels ofXusing the same indices or independently. Can't beTrueifbatches_per_epochisNoneoryis notNone.shuffle (
bool) -- Whether to sample windows from timeseries randomly or in order along the time axis. Ifcoincident=Falseandshuffle=False, channels will be iterated through with the index along the last channel moving fastest.device (
str) -- Which device to host the timeseries arrays on
- init_indices()
Initialize arrays of indices we'll use to slice through X and y at iteration time. This helps by taking care of building in any randomness upfront.
- property num_kernels: int
The number of windows contained in the timeseries if we sample at the specified stride.