datasets - provides quick methods to produce common datasets.

The file yann.special.datasets.py contains the definition for some methods that can produce quickly some datasets. Some of them include :

  • cook_mnist
  • cook_cifar10
  • ...
class yann.special.datasets.combine_split_datasets(loc, verbose=1, **kwargs)[source]

This will combine two split datasets into one.

Todo

Extend it for non-split datasets also.

Parameters:
  • loc – A tuple of a list of locations of two dataset to be blended.
  • verbose – As always

Notes

At this moment, mini_batches_per_batch and mini_batch_size of both datasets must be the same. This only splits the train data with shot. The test and valid hold both. This is designed for the incremental learning. New labels are created in one shot labels for the second datasets. This does not assume that labels are shared between the two datasets.

combine(verbose=1)[source]

Thie method runs the combine.

Parameters:verbose – As Always
dataset_location()[source]

Use this function that return the location of dataset.

load_data(n_batches_1, n_batches_2, type='train', batch=0, verbose=2)[source]

Will load the data from the file and will return the data. Will supply two batches one from each set respectively.

Parameters:
  • typetrain, test or valid. default is train
  • batch – Supply an integer
  • n_batches_1 – Number of batches in dataset 1
  • n_batches_2 – Number of batches in dataset 2
  • verbose – Simliar to verbose in toolbox.

Todo

Create and load dataset for type = ‘x’

Returns:data_x, data_y
Return type:numpy.ndarray
save_data(data_x, data_y, type='train', batch=0, verbose=2)[source]

Saves down a batch of data.

yann.special.datasets.cook_caltech101(verbose=1, **kwargs)[source]

Wrapper to cook cifar10 dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_caltech256(verbose=1, **kwargs)[source]

Wrapper to cook cifar10 dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_celeba_normalized_zero_mean(verbose=1, location='_data/celebA', **kwargs)[source]

Wrapper to cook Celeb-A dataset in preparation for GANs. Will take as input,

Parameters:
  • location – Location where celebA was downloaded using yann.specials.datasets.download_celebA
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_cifar10(verbose=1, **kwargs)[source]

Wrapper to cook cifar10 dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_cifar10_normalized(verbose=1, **kwargs)[source]

Wrapper to cook cifar10 dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_cifar10_normalized_zero_mean(verbose=1, **kwargs)[source]

Wrapper to cook cifar10 dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.cook_mnist(verbose=1, **kwargs)[source]

Wrapper to cook mnist dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset

Notes

By default, this will create a dataset that is not mean-subtracted.

yann.special.datasets.cook_mnist_multi_load(verbose=1, **kwargs)[source]

Testing code, mainly. Wrapper to cook mnist dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset

Notes

This just creates a data_params that loads multiple batches without cache. I use this to test the cahcing working on datastream module.

yann.special.datasets.cook_mnist_normalized(verbose=1, **kwargs)[source]

Wrapper to cook mnist dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset

Notes

By default, this will create a dataset that is not mean-subtracted.

yann.special.datasets.cook_mnist_normalized_zero_mean(verbose=1, **kwargs)[source]

Wrapper to cook mnist dataset. Will take as input,

Parameters:
  • save_directory – which directory to save the cooked dataset onto.
  • dataset_parms – default is the dictionary. Refer to setup_dataset
  • preprocess_params – default is the dictionary. Refer to setup_dataset
yann.special.datasets.download_celebA(data_dir='celebA')[source]

This method downloads celebA dataset into directory _data/data_dir.

Parameters:data_dir – Location to save the data.
class yann.special.datasets.mix_split_datasets(loc, verbose=1, **kwargs)[source]

Everything is the same, except, labels are mixed.

class yann.special.datasets.split_all(dataset_init_args, save_directory='_datasets', verbose=0, **kwargs)[source]

Inheriting from the setup dataset. The new methods added will include the split.

class yann.special.datasets.split_continual(dataset_init_args, save_directory='_datasets', verbose=0, n_classes=10, **kwargs)[source]

Inheriting from the setup dataset. This method will produce datasets setup for continual learning systems.

class yann.special.datasets.split_only_train(dataset_init_args, save_directory='_datasets', verbose=0, **kwargs)[source]

Inheriting from the split dataset. The new methods added will include the split.