pytorch / 1 / generated / torch.optim.rprop.html

Rprop

class torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50), foreach=None, maximize=False) [source]

Implements the resilient backpropagation algorithm.

input : θ 0 R d (params) , f ( θ ) (objective) , η + / (etaplus, etaminus) , Γ m a x / m i n (step sizes) initialize : g p r e v 0 0 , η 0 lr (learning rate) for t = 1 to do g t θ f t ( θ t 1 ) for i = 0 , 1 , , d 1 d o if g p r e v i g t i > 0 η t i m i n ( η t 1 i η + , Γ m a x ) else if g p r e v i g t i < 0 η t i m a x ( η t 1 i η , Γ m i n ) g t i 0 else η t i η t 1 i θ t θ t 1 η t s i g n ( g t ) g p r e v g t r e t u r n θ t \begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \theta_0 \in \mathbf{R}^d \text{ (params)},f(\theta) \text{ (objective)}, \\ &\hspace{13mm} \eta_{+/-} \text{ (etaplus, etaminus)}, \Gamma_{max/min} \text{ (step sizes)} \\ &\textbf{initialize} : g^0_{prev} \leftarrow 0, \: \eta_0 \leftarrow \text{lr (learning rate)} \\ &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm} \textbf{for} \text{ } i = 0, 1, \ldots, d-1 \: \mathbf{do} \\ &\hspace{10mm} \textbf{if} \: g^i_{prev} g^i_t > 0 \\ &\hspace{15mm} \eta^i_t \leftarrow \mathrm{min}(\eta^i_{t-1} \eta_{+}, \Gamma_{max}) \\ &\hspace{10mm} \textbf{else if} \: g^i_{prev} g^i_t < 0 \\ &\hspace{15mm} \eta^i_t \leftarrow \mathrm{max}(\eta^i_{t-1} \eta_{-}, \Gamma_{min}) \\ &\hspace{15mm} g^i_t \leftarrow 0 \\ &\hspace{10mm} \textbf{else} \: \\ &\hspace{15mm} \eta^i_t \leftarrow \eta^i_{t-1} \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1}- \eta_t \mathrm{sign}(g_t) \\ &\hspace{5mm}g_{prev} \leftarrow g_t \\ &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\[-1.ex] &\rule{110mm}{0.4pt} \\[-1.ex] \end{aligned}

For further details regarding the algorithm we refer to the paper A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float, optional) – learning rate (default: 1e-2)
  • etas (Tuple[float, float], optional) – pair of (etaminus, etaplis), that are multiplicative increase and decrease factors (default: (0.5, 1.2))
  • step_sizes (Tuple[float, float], optional) – a pair of minimal and maximal allowed step sizes (default: (1e-6, 50))
  • foreach (bool, optional) – whether foreach implementation of optimizer is used (default: None)
  • maximize (bool, optional) – maximize the params based on the objective, instead of minimizing (default: False)
add_param_group(param_group)

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Parameters:

param_group (dict) – Specifies what Tensors should be optimized along with group specific optimization options.

load_state_dict(state_dict)

Loads the optimizer state.

Parameters:

state_dict (dict) – optimizer state. Should be an object returned from a call to state_dict().

state_dict()

Returns the state of the optimizer as a dict.

It contains two entries:

  • state - a dict holding current optimization state. Its content

    differs between optimizer classes.

  • param_groups - a list containing all parameter groups where each

    parameter group is a dict

step(closure=None) [source]

Performs a single optimization step.

Parameters:

closure (Callable, optional) – A closure that reevaluates the model and returns the loss.

zero_grad(set_to_none=False)

Sets the gradients of all optimized torch.Tensor s to zero.

Parameters:

set_to_none (bool) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).

© 2024, PyTorch Contributors
PyTorch has a BSD-style license, as found in the LICENSE file.
https://pytorch.org/docs/1.13/generated/torch.optim.Rprop.html