datastream
- datastream class¶
The file yann.modules.datastream.py
contains the definition for the datastream:
-
class
yann.modules.datastream.
datastream
(dataset_init_args, borrow=True, verbose=1)[source]¶ This module initializes the dataset to the network class and provides all dataset related functionalities. It also provides for dynamically loading and caching dataset batches. :mod:
add_layer
will use this to initialize.Parameters: - dataset_init_args – Is a dictionary of the form:
- borrow –
Theano’s borrow. Default value is
True
.dataset_init_args = { "dataset": <location> "svm" : False or True ``svm`` if ``True``, a one-hot label set will also be setup. "n_classes": <int> ``n_classes`` if ``svm`` is ``True``, we need to know how many ``n_classes`` are present. "id": id of the datastream }
- verbose – Similar to verbose throughout the toolbox.
Returns: A dataset module object that has the details of loader and other things.
Return type: Todo
- Datastream should work with Fuel perhaps ?
- Support HDf5 perhaps
-
initialize_dataset
(verbose=1)[source]¶ Load the initial training batch of data on to
data_x
anddata_y
variables and create shared memories.Todo
I am assuming that training has the largest number of data. This is immaterial when caching but during set_data routine, I need to be careful.
Parameters: verbose – Toolbox style verbose.
-
load_data
(type='train', batch=0, verbose=2)[source]¶ Will load the data from the file and will return the data. The important thing to note is that all the datasets in :mod:
yann
all require ay
or a variable to predict. In case of auto-encoder for instance, the thing to predict is the image itself. Setup dataset thusly.Parameters: - type –
train
,test
orvalid
. default istrain
- batch – Supply an integer
- verbose – Simliar to verbose in toolbox.
Todo
Create and load dataset for type = ‘x’
Returns: data_x, data_y
Return type: numpy.ndarray - type –
-
one_hot_labels
(y, verbose=1)[source]¶ Function takes in labels and returns a one-hot encoding. Used for max-margin loss. :param y: Labels to be encoded.n_classes :param verbose: Typical as in the rest of the toolbox.
Notes
self.n_classes
: Number of unique classes in the labels.This could be found out using the following: .. code-block: python
import numpy n_classes = len(numpy.unique(y))This might be potentially dangerous in case of cached dataset. Although this is the default if
n_classes
is not provided as input to this module, I discourage anyone from using this.Returns: one-hot encoded label list. Return type: numpy ndarray