Module ojalgo

Class FeatureBasedClusterer

java.lang.Object
org.ojalgo.data.cluster.FeatureBasedClusterer
All Implemented Interfaces:
ClusteringAlgorithm<Point>

public abstract class FeatureBasedClusterer extends Object implements ClusteringAlgorithm<Point>
Facade for clustering objects represented by float feature vectors.

Provides utilities to cluster arbitrary data by mapping items to Points (immutable float[] feature vectors).

Usage: Use cluster(Collection, Function) to cluster your own data by providing an extractor that produces the feature vector for each item. The result is a list of clusters, each represented as a Map<T, float[]> containing the original items and their extracted features.

Available clustering algorithms:

Performance: Internally, distances are cached for efficiency. All clustering is performed on Point objects with unique ids and float[] coordinates.

Extensibility: Subclasses implement ClusteringAlgorithm.cluster(Collection) to provide concrete clustering strategies over Points.

Thread safety: Not thread-safe. Each instance maintains internal state for distance caching.

Author:
apete
  • Method Details

    • newAutomatic

      public static FeatureBasedClusterer newAutomatic()
      Returns a new automatic clusterer using squared Euclidean distance. Equivalent to newAutomatic(DistanceMeasure) with DistanceMeasure.SQUARED_EUCLIDEAN.
      Returns:
      a new automatic clusterer
    • newAutomatic

      public static FeatureBasedClusterer newAutomatic(DistanceMeasure measure)
      Returns a new automatic clusterer using the specified distance measure.

      The algorithm:

      1. Extracts features
      2. Caches all pairwise distances
      3. Performs statistical analysis to determine a distance threshold
      4. Performs greedy clustering to get initial centroids
      5. Filters out very small clusters (determining k)
      6. Performs k-means clustering to refine clusters and centroids
      Parameters:
      measure - the distance measure to use
      Returns:
      a new automatic clusterer
    • newGreedy

      public static FeatureBasedClusterer newGreedy(DistanceMeasure measure, double threshold)
      Returns a new greedy, single-pass clusterer using the supplied distance and threshold.

      Each item is assigned to the nearest existing centroid if its distance is <= threshold; otherwise a new cluster is created. The threshold must be in the same units as the chosen distance measure.

      Parameters:
      measure - the distance measure
      threshold - the maximum allowed distance to join an existing cluster
      Returns:
      a new greedy clusterer
    • newGreedy

      public static FeatureBasedClusterer newGreedy(double threshold)
      Returns a new greedy, single-pass clusterer using squared Euclidean distance and the given threshold.
      Parameters:
      threshold - the maximum allowed distance to join an existing cluster
      Returns:
      a new greedy clusterer
    • newKMeans

      public static FeatureBasedClusterer newKMeans(DistanceMeasure measure, int k)
      Returns a new k-means–style clusterer using the supplied distance measure and number of clusters.
      Parameters:
      measure - the distance function
      k - the number of clusters (k >= 1)
      Returns:
      a new k-means clusterer
    • newKMeans

      public static FeatureBasedClusterer newKMeans(int k)
      Returns a new k-means–style clusterer using squared Euclidean distance and the given number of clusters.
      Parameters:
      k - the number of clusters (k >= 1)
      Returns:
      a new k-means clusterer
    • newSpectral

      public static FeatureBasedClusterer newSpectral(DistanceMeasure measure, int k)
      Returns a new spectral clusterer using the supplied distance measure and number of clusters.

      Uses a Gaussian kernel and the symmetric normalised Laplacian.

      Parameters:
      measure - the distance measure for the kernel
      k - the number of clusters (k >= 1)
      Returns:
      a new spectral clusterer
    • newSpectral

      public static FeatureBasedClusterer newSpectral(int k)
      Returns a new spectral clusterer using squared Euclidean distance and the given number of clusters.
      Parameters:
      k - the number of clusters (k >= 1)
      Returns:
      a new spectral clusterer
    • cluster

      public final <T> List<Map<T,float[]>> cluster(Collection<T> input, Function<T,float[]> extractor)
      Clusters arbitrary items by first extracting their float feature representation.

      Each item is wrapped as a Point using the extractor output. Clustering is then performed by ClusteringAlgorithm.cluster(Collection). The result mirrors the internal clusters but maps back to the original items along with their feature vectors.

      Type Parameters:
      T - the item type
      Parameters:
      input - the items to cluster (not null)
      extractor - a function that returns a non-null float[] feature vector for an item
      Returns:
      a list of clusters, each as a map from the original item to its feature vector, sorted by decreasing size