On this page
tf.keras.mixed_precision.experimental.LossScaleOptimizer
An deprecated optimizer that applies loss scaling.
Inherits From: LossScaleOptimizer
, Optimizer
tf.keras.mixed_precision.experimental.LossScaleOptimizer(
optimizer, loss_scale
)
This class is identical to the non-experimental keras.mixed_precision.LossScaleOptimizer
except its constructor takes different arguments. For this class (the experimental version), the constructor takes a loss_scale
argument. For the non-experimental class, the constructor encodes the loss scaling information in multiple arguments. Note that unlike this class, the non-experimental class does not accept a tf.compat.v1.mixed_precision.LossScale
, which is deprecated.
If you currently use this class, you should switch to the non-experimental tf.keras.mixed_precision.LossScaleOptimizer
instead. We show several examples of converting the use of the experimental class to the equivalent non-experimental class.
# In all of the the examples below, `opt1` and `opt2` are identical
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale='dynamic')
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD())
assert opt1.get_config() == opt2.get_config()
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale=123)
# dynamic=False indicates to use fixed loss scaling. initial_scale=123
# refers to the initial loss scale, which is the single fixed loss scale
# when dynamic=False.
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD(), dynamic=False, initial_scale=123)
assert opt1.get_config() == opt2.get_config()
loss_scale = tf.compat.v1.mixed_precision.experimental.DynamicLossScale(
initial_loss_scale=2048, increment_period=500)
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale=loss_scale)
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD(), initial_scale=2048,
dynamic_growth_steps=500)
assert opt1.get_config() == opt2.get_config()
Make sure to also switch from this class to the non-experimental class in isinstance checks, if you have any. If you do not do this, your model may run into hard-to-debug issues, as the experimental LossScaleOptimizer
subclasses the non-experimental LossScaleOptimizer
, but not vice versa. It is safe to switch isinstance checks to the non-experimental LossScaleOptimizer
even before using the non-experimental LossScaleOptimizer
.
opt1 = tf.keras.mixed_precision.experimental.LossScaleOptimizer(
tf.keras.optimizers.SGD(), loss_scale='dynamic')
# The experimental class subclasses the non-experimental class
isinstance(opt1, tf.keras.mixed_precision.LossScaleOptimizer)
True
opt2 = tf.keras.mixed_precision.LossScaleOptimizer(
tf.keras.optimizers.SGD())
# The non-experimental class does NOT subclass the experimental class.
isinstance(opt2, tf.keras.mixed_precision.experimental.LossScaleOptimizer)
False
Args | |
---|---|
optimizer |
The Optimizer instance to wrap. |
loss_scale |
The loss scale to scale the loss and gradients. This can either be an int/float to use a fixed loss scale, the string "dynamic" to use dynamic loss scaling, or an instance of a LossScale. The string "dynamic" equivalent to passing DynamicLossScale() , and passing an int/float is equivalent to passing a FixedLossScale with the given loss scale. If a DynamicLossScale is passed, DynamicLossScale.multiplier must be 2 (the default). |
Raises | |
---|---|
ValueError |
in case of any invalid argument. |
Attributes | |
---|---|
dynamic |
Bool indicating whether dynamic loss scaling is used. |
dynamic_counter |
The number of steps since the loss scale was last increased or decreased. This is None if The counter is incremented every step. Once it reaches |
dynamic_growth_steps |
The number of steps it takes to increase the loss scale. This is None if Every |
initial_scale |
The initial loss scale. If |
inner_optimizer |
The optimizer that this LossScaleOptimizer is wrapping. |
loss_scale |
The current loss scale as a float32 scalar tensor. |
Methods
get_scaled_loss
get_scaled_loss(
loss
)
Scales the loss by the loss scale.
This method is only needed if you compute gradients manually, e.g. with tf.GradientTape
. In that case, call this method to scale the loss before passing the loss to tf.GradientTape
. If you use LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss scaling is automatically applied and this method is unneeded.
If this method is called, get_unscaled_gradients
should also be called. See the tf.keras.mixed_precision.LossScaleOptimizer
doc for an example.
Args | |
---|---|
loss |
The loss, which will be multiplied by the loss scale. Can either be a tensor or a callable returning a tensor. |
Returns | |
---|---|
loss multiplied by LossScaleOptimizer.loss_scale . |
get_unscaled_gradients
get_unscaled_gradients(
grads
)
Unscales the gradients by the loss scale.
This method is only needed if you compute gradients manually, e.g. with tf.GradientTape
. In that case, call this method to unscale the gradients after computing them with tf.GradientTape
. If you use LossScaleOptimizer.minimize
or LossScaleOptimizer.get_gradients
, loss scaling is automatically applied and this method is unneeded.
If this method is called, get_scaled_loss
should also be called. See the tf.keras.mixed_precision.LossScaleOptimizer
doc for an example.
Args | |
---|---|
grads |
A list of tensors, each which will be divided by the loss scale. Can have None values, which are ignored. |
Returns | |
---|---|
A new list the same size as grads , where every non-None value in grads is divided by LossScaleOptimizer.loss_scale . |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/mixed_precision/experimental/LossScaleOptimizer