On this page
tf.data.experimental.index_table_from_dataset
Returns an index lookup table based on the given dataset.
tf.data.experimental.index_table_from_dataset(
dataset=None,
num_oov_buckets=0,
vocab_size=None,
default_value=-1,
hasher_spec=lookup_ops.FastHashSpec,
key_dtype=tf.dtypes.string,
name=None
)
This operation constructs a lookup table based on the given dataset of keys.
Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if num_oov_buckets
is greater than zero. Otherwise it is assigned the default_value
. The bucket ID range is [vocabulary size, vocabulary size + num_oov_buckets - 1]
.
Sample Usages:
ds = tf.data.Dataset.range(100).map(lambda x: tf.strings.as_string(x * 2))
table = tf.data.experimental.index_table_from_dataset(
ds, key_dtype=dtypes.int64)
table.lookup(tf.constant(['0', '2', '4'], dtype=tf.string)).numpy()
array([0, 1, 2])
Args | |
---|---|
dataset |
A dataset of keys. |
num_oov_buckets |
The number of out-of-vocabulary buckets. |
vocab_size |
Number of the elements in the vocabulary, if known. |
default_value |
The value to use for out-of-vocabulary feature values. Defaults to -1. |
hasher_spec |
A HasherSpec to specify the hash function to use for assignation of out-of-vocabulary buckets. |
key_dtype |
The key data type. |
name |
A name for this op (optional). |
Returns | |
---|---|
The lookup table based on the given dataset. |
Raises | |
---|---|
ValueError |
If
|
© 2022 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 4.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.9/api_docs/python/tf/data/experimental/index_table_from_dataset