org.apache.spark.streamdm.clusterers
Assigns examples to clusters, given the current Clusters data structure.
Get the currently computed clusters
Gets the current Model used for the Learner.
Init the StreamKM++ algorithm.
Maintain the BucketManager for coreset extraction, given an input DStream of Example.
Implements the StreamKM++ algorithm for data streams. StreamKM++ computes a small (weighted) sample of the stream by using coresets, and then uses it as an input to a k-means++ algorithm. It uses a data structure called BucketManager to handle the coresets.
It uses the following options: