tf.contrib.factorization.KMeans
Creates the graph for k-means clustering.
tf.contrib.factorization.KMeans( inputs, num_clusters, initial_clusters=RANDOM_INIT, distance_metric=SQUARED_EUCLIDEAN_DISTANCE, use_mini_batch=False, mini_batch_steps_per_iteration=1, random_seed=0, kmeans_plus_plus_num_retries=2, kmc2_chain_length=200 )
Args | |
---|---|
inputs |
An input tensor or list of input tensors. It is assumed that the data points have been previously randomly permuted. |
num_clusters |
An integer tensor specifying the number of clusters. This argument is ignored if initial_clusters is a tensor or numpy array. |
initial_clusters |
Specifies the clusters used during initialization. One of the following:
|
distance_metric |
Distance metric used for clustering. Supported options: "squared_euclidean", "cosine". |
use_mini_batch |
If true, use the mini-batch k-means algorithm. Else assume full batch. |
mini_batch_steps_per_iteration |
Number of steps after which the updated cluster centers are synced back to a master copy. |
random_seed |
Seed for PRNG used to initialize seeds. |
kmeans_plus_plus_num_retries |
For each point that is sampled during kmeans++ initialization, this parameter specifies the number of additional points to draw from the current distribution before selecting the best. If a negative value is specified, a heuristic is used to sample O(log(num_to_sample)) additional points. |
kmc2_chain_length |
Determines how many candidate points are used by the k-MC2 algorithm to produce one new cluster centers. If a (mini-)batch contains less points, one new cluster center is generated from the (mini-)batch. |
Raises | |
---|---|
ValueError |
An invalid argument was passed to initial_clusters or distance_metric. |
Methods
training_graph
training_graph()
Generate a training graph for kmeans algorithm.
This returns, among other things, an op that chooses initial centers (init_op), a boolean variable that is set to True when the initial centers are chosen (cluster_centers_initialized), and an op to perform either an entire Lloyd iteration or a mini-batch of a Lloyd iteration (training_op). The caller should use these components as follows. A single worker should execute init_op multiple times until cluster_centers_initialized becomes True. Then multiple workers may execute training_op any number of times.
Returns | |
---|---|
A tuple consisting of: | |
all_scores |
A matrix (or list of matrices) of dimensions (num_input, num_clusters) where the value is the distance of an input vector and a cluster center. |
cluster_idx |
A vector (or list of vectors). Each element in the vector corresponds to an input row in 'inp' and specifies the cluster id corresponding to the input. |
scores |
Similar to cluster_idx but specifies the distance to the assigned cluster instead. |
cluster_centers_initialized |
scalar indicating whether clusters have been initialized. |
init_op |
an op to initialize the clusters. |
training_op |
an op that runs an iteration of training. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/factorization/KMeans