optimizer - optimizer class

The file yann.modules.optimizer.py contains the definition for the optimizer:

class yann.modules.optimizer.optimizer(optimizer_init_args, verbose=1)[source]

Optimizer is an important module of the toolbox. Optimizer creates the protocols required for learning. yann‘s optimizer supports the following optimization techniques:

  • Stochastic Gradient Descent
  • AdaGrad [1]
  • RmsProp [2]
  • Adam [3]
  • Adadelta [4]

Optimizer also supports the following momentum techniques:

  • Polyak [5]
  • Nesterov [6]
[1]John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR
[2]Yann N. Dauphin, Harm de Vries, Junyoung Chung, Yoshua Bengio,”RMSProp and equilibrated adaptive learning rates for non-convex optimization”, or arXiv:1502.04390v1
[3]Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014).
[4]Zeiler, Matthew D. “ADADELTA: an adaptive learning rate method.” arXiv preprint arXiv:1212.5701 (2012).
[5]Polyak, Boris Teodorovich. “Some methods of speeding up the convergence of iteration methods.” USSR Computational Mathematics and Mathematical Physics 4.5 (1964): 1-17. Implementation was adapted from Sutskever, Ilya, et al. “On the importance of initialization and momentum in deep learning.” Proceedings of the 30th international conference on machine learning (ICML-13). 2013.
[6]Nesterov, Yurii. “A method of solving a convex programming problem with convergence rate O (1/k2).” Soviet Mathematics Doklady. Vol. 27. No. 2. 1983. Adapted from Sebastien Bubeck’s blog.
  • verbose – Similar to any 3-level verbose in the toolbox.
  • optimizer_init_args

    optimizer_init_args is a dictionary like:

    optimizer_params =  {
        "momentum_type"   : <option>  'false' <no momentum>, 'polyak', 'nesterov'.
                            Default value is 'false'
        "momentum_params" : (<option in range [0,1]>, <option in range [0,1]>, <int>)
                            (momentum coeffient at start,at end,
                            at what epoch to end momentum increase)
                            Default is the tuple (0.5, 0.95,50)
        "optimizer_type" : <option>, 'sgd', 'adagrad', 'rmsprop', 'adam'.
                           Default is 'sgd'
        "id"        : id of the optimizer

Optimizer object

Return type:


calculate_gradients(params, objective, verbose=1)[source]

This method initializes the gradients.

  • params – Supply learnable active parameters of a network.
  • objective – supply a theano graph connecting the params to a loss
  • verbose – Just as always


Once this is setup, optimizer.gradients are available


This basically creates all the updates and update functions which trainers can iterate upon.

Parameters:verbose – Just as always