datasets
- provides quick methods to produce common datasets.¶
The file yann.special.datasets.py
contains the definition for some methods that can produce
quickly some datasets. Some of them include :
cook_mnist
cook_cifar10
- ...
-
class
yann.special.datasets.
combine_split_datasets
(loc, verbose=1, **kwargs)[source]¶ This will combine two split datasets into one.
Todo
Extend it for non-split datasets also.
Parameters: - loc – A tuple of a list of locations of two dataset to be blended.
- verbose – As always
Notes
At this moment, mini_batches_per_batch and mini_batch_size of both datasets must be the same. This only splits the train data with shot. The test and valid hold both. This is designed for the incremental learning. New labels are created in one shot labels for the second datasets. This does not assume that labels are shared between the two datasets.
-
load_data
(n_batches_1, n_batches_2, type='train', batch=0, verbose=2)[source]¶ Will load the data from the file and will return the data. Will supply two batches one from each set respectively.
Parameters: - type –
train
,test
orvalid
. default istrain
- batch – Supply an integer
- n_batches_1 – Number of batches in dataset 1
- n_batches_2 – Number of batches in dataset 2
- verbose – Simliar to verbose in toolbox.
Todo
Create and load dataset for type = ‘x’
Returns: data_x, data_y
Return type: numpy.ndarray - type –
-
yann.special.datasets.
cook_caltech101
(verbose=1, **kwargs)[source]¶ Wrapper to cook cifar10 dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
cook_caltech256
(verbose=1, **kwargs)[source]¶ Wrapper to cook cifar10 dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
cook_celeba_normalized_zero_mean
(verbose=1, location='_data/celebA', **kwargs)[source]¶ Wrapper to cook Celeb-A dataset in preparation for GANs. Will take as input,
Parameters: - location – Location where celebA was downloaded using
yann.specials.datasets.download_celebA
- save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
- location – Location where celebA was downloaded using
-
yann.special.datasets.
cook_cifar10
(verbose=1, **kwargs)[source]¶ Wrapper to cook cifar10 dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
cook_cifar10_normalized
(verbose=1, **kwargs)[source]¶ Wrapper to cook cifar10 dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
cook_cifar10_normalized_zero_mean
(verbose=1, **kwargs)[source]¶ Wrapper to cook cifar10 dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
cook_mnist
(verbose=1, **kwargs)[source]¶ Wrapper to cook mnist dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
Notes
By default, this will create a dataset that is not mean-subtracted.
-
yann.special.datasets.
cook_mnist_multi_load
(verbose=1, **kwargs)[source]¶ Testing code, mainly. Wrapper to cook mnist dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
Notes
This just creates a
data_params
that loads multiple batches without cache. I use this to test the cahcing working on datastream module.
-
yann.special.datasets.
cook_mnist_normalized
(verbose=1, **kwargs)[source]¶ Wrapper to cook mnist dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
Notes
By default, this will create a dataset that is not mean-subtracted.
-
yann.special.datasets.
cook_mnist_normalized_zero_mean
(verbose=1, **kwargs)[source]¶ Wrapper to cook mnist dataset. Will take as input,
Parameters: - save_directory – which directory to save the cooked dataset onto.
- dataset_parms – default is the dictionary. Refer to
setup_dataset
- preprocess_params – default is the dictionary. Refer to
setup_dataset
-
yann.special.datasets.
download_celebA
(data_dir='celebA')[source]¶ This method downloads celebA dataset into directory _data/
data_dir
.Parameters: data_dir – Location to save the data.
-
class
yann.special.datasets.
mix_split_datasets
(loc, verbose=1, **kwargs)[source]¶ Everything is the same, except, labels are mixed.
-
class
yann.special.datasets.
split_all
(dataset_init_args, save_directory='_datasets', verbose=0, **kwargs)[source]¶ Inheriting from the setup dataset. The new methods added will include the split.