optimizer
- optimizer class¶
The file yann.modules.optimizer.py
contains the definition for the optimizer:
-
class
yann.modules.optimizer.
optimizer
(optimizer_init_args, verbose=1)[source]¶ Optimizer is an important module of the toolbox. Optimizer creates the protocols required for learning.
yann
‘s optimizer supports the following optimization techniques:- Stochastic Gradient Descent
- AdaGrad [1]
- RmsProp [2]
- Adam [3]
- Adadelta [4]
Optimizer also supports the following momentum techniques:
- Polyak [5]
- Nesterov [6]
[1] John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR [2] Yann N. Dauphin, Harm de Vries, Junyoung Chung, Yoshua Bengio,”RMSProp and equilibrated adaptive learning rates for non-convex optimization”, or arXiv:1502.04390v1 [3] Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014). [4] Zeiler, Matthew D. “ADADELTA: an adaptive learning rate method.” arXiv preprint arXiv:1212.5701 (2012). [5] Polyak, Boris Teodorovich. “Some methods of speeding up the convergence of iteration methods.” USSR Computational Mathematics and Mathematical Physics 4.5 (1964): 1-17. Implementation was adapted from Sutskever, Ilya, et al. “On the importance of initialization and momentum in deep learning.” Proceedings of the 30th international conference on machine learning (ICML-13). 2013. [6] Nesterov, Yurii. “A method of solving a convex programming problem with convergence rate O (1/k2).” Soviet Mathematics Doklady. Vol. 27. No. 2. 1983. Adapted from Sebastien Bubeck’s blog. Parameters: - verbose – Similar to any 3-level verbose in the toolbox.
- optimizer_init_args –
optimizer_init_args
is a dictionary like:optimizer_params = { "momentum_type" : <option> 'false' <no momentum>, 'polyak', 'nesterov'. Default value is 'false' "momentum_params" : (<option in range [0,1]>, <option in range [0,1]>, <int>) (momentum coeffient at start,at end, at what epoch to end momentum increase) Default is the tuple (0.5, 0.95,50) "optimizer_type" : <option>, 'sgd', 'adagrad', 'rmsprop', 'adam'. Default is 'sgd' "id" : id of the optimizer }
Returns: Optimizer object
Return type: -
calculate_gradients
(params, objective, verbose=1)[source]¶ This method initializes the gradients.
Parameters: - params – Supply learnable active parameters of a network.
- objective – supply a theano graph connecting the params to a loss
- verbose – Just as always
Notes
Once this is setup,
optimizer.gradients
are available