On this page
tf.keras.layers.AdditiveAttention
Additive attention layer, a.k.a. Bahdanau-style attention.
tf.keras.layers.AdditiveAttention(
    use_scale=True, **kwargs
)
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:
- Reshape queryandvalueinto shapes[batch_size, Tq, 1, dim]and[batch_size, 1, Tv, dim]respectively.
- Calculate scores with shape [batch_size, Tq, Tv]as a non-linear sum:scores = tf.reduce_sum(tf.tanh(query + value), axis=-1)
- Use scores to calculate a distribution with shape [batch_size, Tq, Tv]:distribution = tf.nn.softmax(scores).
- Use distributionto create a linear combination ofvaluewith shapebatch_size, Tq, dim]:return tf.matmul(distribution, value).
| Args | |
|---|---|
| use_scale | If True, will create a variable to scale the attention scores. | 
| causal | Boolean. Set to Truefor decoder self-attention. Adds a mask such that positionicannot attend to positionsj > i. This prevents the flow of information from the future towards the past. | 
| dropout | Float between 0 and 1. Fraction of the units to drop for the attention scores. | 
Call Arguments:
- inputs: List of the following tensors:- query: Query Tensorof shape[batch_size, Tq, dim].
- value: Value Tensorof shape[batch_size, Tv, dim].
- key: Optional key Tensorof shape[batch_size, Tv, dim]. If not given, will usevaluefor bothkeyandvalue, which is the most common case.
 
- query: Query 
- mask: List of the following tensors:- query_mask: A boolean mask Tensorof shape[batch_size, Tq]. If given, the output will be zero at the positions wheremask==False.
- value_mask: A boolean mask Tensorof shape[batch_size, Tv]. If given, will apply the mask such that values at positions wheremask==Falsedo not contribute to the result.
 
- query_mask: A boolean mask 
- training: Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout).
- return_attention_scores: bool, it- True, returns the attention scores (after masking and softmax) as an additional output argument.
Output:
Attention outputs of shape [batch_size, Tq, dim]. [Optional] Attention scores after masking and softmax with shape [batch_size, Tq, Tv].
The meaning of query, value and key depend on the application. In the case of text similarity, for example, query is the sequence embeddings of the first piece of text and value is the sequence embeddings of the second piece of text. key is usually the same tensor as value.
Here is a code example for using AdditiveAttention in a CNN+Attention network:
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)
# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
    filters=100,
    kernel_size=4,
    # Use 'same' padding so outputs have the same shape as inputs.
    padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.AdditiveAttention()(
    [query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
    query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
    query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
    [query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
 https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/layers/AdditiveAttention