Data Generators

Data Generators

Hyper Plan Generator

Generates a problem of predicting class of a rotating hyperplane.

Parameters:

  • Chunk size (-k)
  • Slide duration in milliseconds (-d)
  • Number of features (-f)

Random Tree Generator

Generates a stream based on a randomly generated tree. It constructs a decision tree by splitting a chosen feature randomly, and assigning a random class label to each leaf. Once the tree is built, new examples are generated by assigning uniformly distributed random values to features which then determine the class label via the tree.

Parameters:

  • Chunk size (-k)
  • Slid duration (-d)
  • The number of features to generate (-f)
  • The number of classes to generate (-n)
  • The number of nominal attributes to generate (-o)
  • The number of numeric attributes to generate (-u)
  • The number of values to generate per nominal attribute (-v)
  • The maximum depth of the tree concept (-x)
  • The first level of the tree above maxTreeDepth that can have leaves (-l)
  • The fraction of leaves per level from firstLeafLevel onwards (-r)

Random RBF Generator

Generates a random radial basis function data. This generator was devised to offer an alternate complex concept type that is not straightforward to approximate with a decision tree model. The RBF (Radial Basis Function) generator works as follows: A fixed number of random centroids are generated. Each center has a random position, a single standard deviation, class label and weight. New examples are generated by selecting a center at random, taking weights into consideration so that centers with higher weight are more likely to be chosen. A random direction is chosen to offset the attribute values from the central point. The length of the displacement is randomly drawn from a Gaussian distribution with standard deviation determined by the chosen centroid. The chosen centroid also determines the class label of the example. This effectively creates a normally distributed hypersphere of examples surrounding each central point with varying densities. This generator only generates numeric examples.

Parameters:

  • Chunk size (-k)
  • Slid duration (-d)
  • Seed for random generation of model (-m)
  • Seed for random generation of instances (-i)
  • The number of classes to generate (-n)
  • The number of features to generate (-f)
  • The number of centroids in the model (-c)

Random RBF Events Generator

Generates a random radial basis function data for Clustream.

Parameters:

  • Chunk size (-k)
  • Slide duration (-d)
  • Seed for random generation of model (-m)
  • Seed for random generation of instances (-i)
  • The average number of centroids in the model (-C)
  • Deviation of the number of centroids in the model (-c)
  • The average radii of the centroids in the model (-R)
  • Deviation of average radii of the centroids in the model (-r)
  • Offset of the average weight a cluster has(-D)
  • Kernels move a predefined distance of 0.01 every X points (-V)
  • Speed/Velocity point offset (-v)
  • Noise level (-N)
  • Allow noise to be placed within a cluster(-n)
  • Event frequency (-E)
  • Enable merging and splitting of clusters(-M)
  • Enable emerging and disappearing of clusters (-e)
  • The number of features to generate (-f)
  • Decay horizon (-h)