On this page
tf.keras.preprocessing.text.one_hot
One-hot encodes a text into a list of word indexes of size n
.
tf.keras.preprocessing.text.one_hot(
input_text,
n,
filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
lower=True,
split=' ',
analyzer=None
)
This function receives as input a string of text and returns a list of encoded integers each corresponding to a word (or token) in the given input string.
Args | |
---|---|
input_text |
Input text (string). |
n |
int. Size of vocabulary. |
filters |
list (or concatenation) of characters to filter out, such as punctuation. Default: '!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n , includes basic punctuation, tabs, and newlines. |
lower |
boolean. Whether to set the text to lowercase. |
split |
str. Separator for word splitting. |
analyzer |
function. Custom analyzer to split the text |
Returns | |
---|---|
List of integers in [1, n] . Each integer encodes a word (unicity non-guaranteed). |
© 2022 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 4.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.9/api_docs/python/tf/keras/preprocessing/text/one_hot