tf.tpu.experimental.embedding.SGD

Optimization parameters for stochastic gradient descent for TPU embeddings.

View aliases

Compat aliases for migration

tf.compat.v1.tpu.experimental.embedding.SGD

tf.tpu.experimental.embedding.SGD(
    learning_rate=0.01, clip_weight_min=None, clip_weight_max=None,
    weight_decay_factor=None, multiply_weight_decay_factor_by_learning_rate=None
)

Pass this to tf.tpu.experimental.embedding.TPUEmbedding via the optimizer argument to set the global optimizer and its parameters:

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    ...
    optimizer=tf.tpu.experimental.embedding.SGD(0.1))

This can also be used in a tf.tpu.experimental.embedding.TableConfig as the optimizer parameter to set a table specific optimizer. This will override the optimizer and parameters for global embedding optimizer defined above:

table_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...,
    optimizer=tf.tpu.experimental.embedding.SGD(0.2))
table_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)

feature_config = (
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_one),
    tf.tpu.experimental.embedding.FeatureConfig(
        table=table_two))

embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    batch_size=...
    optimizer=tf.tpu.experimental.embedding.SGD(0.1))

In the above example, the first feature will be looked up in a table that has a learning rate of 0.2 while the second feature will be looked up in a table that has a learning rate of 0.1.

See 'tensorflow/core/protobuf/tpu/optimization_parameters.proto' for a complete description of these parameters and their impacts on the optimizer algorithm.

Args
`learning_rate`	The learning rate. It should be a floating point value or a callable taking no arguments for a dynamic learning rate.
`clip_weight_min`	the minimum value to clip by; None means -infinity.
`clip_weight_max`	the maximum value to clip by; None means +infinity.
`weight_decay_factor`	amount of weight decay to apply; None means that the weights are not decayed. Weights are decayed by multiplying the weight by this factor each step.
`multiply_weight_decay_factor_by_learning_rate`	if true, `weight_decay_factor` is multiplied by the current learning rate.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/tpu/experimental/embedding/SGD

Docs

Docs4dev

Title here

tf.tpu.experimental.embedding.SGD

View aliases