`datastream` - datastream class¶

The file yann.modules.datastream.py contains the definition for the datastream:

class yann.modules.datastream.datastream(dataset_init_args, borrow=True, verbose=1)[source]¶

This module initializes the dataset to the network class and provides all dataset related functionalities. It also provides for dynamically loading and caching dataset batches. :mod: add_layer will use this to initialize.

Parameters:	dataset_init_args – Is a dictionary of the form: borrow – Theano’s borrow. Default value is `True`. dataset_init_args = { "dataset": <location> "svm" : False or True ``svm`` if ``True``, a one-hot label set will also be setup. "n_classes": <int> ``n_classes`` if ``svm`` is ``True``, we need to know how many ``n_classes`` are present. "id": id of the datastream } verbose – Similar to verbose throughout the toolbox.
Returns:	A dataset module object that has the details of loader and other things.
Return type:	dataset

Todo

Datastream should work with Fuel perhaps ?
Support HDf5 perhaps

initialize_dataset(verbose=1)[source]¶

Load the initial training batch of data on to data_x and data_y variables and create shared memories.

Todo

I am assuming that training has the largest number of data. This is immaterial when caching but during set_data routine, I need to be careful.

Parameters:	verbose – Toolbox style verbose.

load_data(type='train', batch=0, verbose=2)[source]¶

Will load the data from the file and will return the data. The important thing to note is that all the datasets in :mod: yann all require a y or a variable to predict. In case of auto-encoder for instance, the thing to predict is the image itself. Setup dataset thusly.

Parameters:	type – `train`, `test` or `valid`. default is `train` batch – Supply an integer verbose – Simliar to verbose in toolbox.

Todo

Create and load dataset for type = ‘x’

Returns:	`data_x, data_y`
Return type:	numpy.ndarray

one_hot_labels(y, verbose=1)[source]¶

Function takes in labels and returns a one-hot encoding. Used for max-margin loss. :param y: Labels to be encoded.n_classes :param verbose: Typical as in the rest of the toolbox.

Notes

self.n_classes: Number of unique classes in the labels.

This could be found out using the following: .. code-block: python

import numpy n_classes = len(numpy.unique(y))

This might be potentially dangerous in case of cached dataset. Although this is the default if n_classes is not provided as input to this module, I discourage anyone from using this.

Returns:	one-hot encoded label list.
Return type:	numpy ndarray

set_data(type='train', batch=0, verbose=2)[source]¶

This can work only after network is cooked.

Parameters:	batch – which batch of data to load and set verbose – as usual

datastream - datastream class¶

`datastream` - datastream class¶