# StreamKM++

## StreamKM++ Clustering

*Ackermann, Lammersen, MÃ¤rtens, Raupach, Sohler, Swierkot. StreamKM++: A
Clustering Algorithm for Data Streams. In Proceedings of the 12th Workshop on
Algorithm Engineering and Experiments (ALENEX 2010)*

### General Description

*StreamKM++* computes a small weighted sample of the data stream,called the
*coreset* of the data stream. A new data structure called coreset tree is
developed in order to significantly speed up the time necessary for sampling
non-uniformly during the coreset construction. After the coreset is extracted
from the data stream, a weighted k-means algorithm is applied on the coreset to
get the final clusters for the original stream data.

### Implementation

In StreamDM, three classes deal with the implementation of the StreamKM++
algorithm: `StreamKM`

, `BucketManager`

, and `TreeCoreset`

.

`BucketManager`

and `TreeCoreset`

are the data structures to deal with the
coreset tree construction process. For the `TreeCoreset`

class no parameters are
required; its behaviour is controlled by the `BucketMaanger`

. The options for
`BucketManager`

are:

- Stream data window size parameter (
**-n**), which controls the stream data size (100000 by default), this value is inherited from the parameters of class`StreamKM`

; - Coreset size parameter (
**-maxsize**) which controls the number of examples in the coreset for the stream data (10000 by default).

`StreamKM`

implements the main algorithm, and is controlled by the following
parameters:

- Number of clusters parameter (
**-k**), which controls how many clusters are output by the offline k-means algorithm (10 by default); - k-means iterations parameter (
**-i**), which controls for how many iterations the k-means algorithm iterates (1000 by default); - Coreset size parameter(
**-s**), which controls the number of examples in the coreset for the StreamKM++ algorithm; - Windows size parameter(
**-w**), which controls how many examples in the stream data are used for coreset extraction for the StreamKM++ algorithm .