dire_rapids.metrics module
Performance metrics for dimensionality reduction evaluation.
This module provides GPU-accelerated metrics using RAPIDS cuML for evaluating the quality of dimensionality reduction embeddings, including:
Distortion metrics (stress)
Context preservation metrics (SVM, kNN classification)
Topological metrics (persistence homology, Betti curves)
The module supports multiple backends for persistence computation: - giotto-ph (fastest CPU, multi-threaded) - ripser++ (GPU-accelerated) - ripser (CPU fallback)
- dire_rapids.metrics.welford_update_gpu(count, mean, M2, new_value, finite_threshold=1000000000000.0)[source]
GPU-accelerated Welford’s algorithm update step.
- Parameters:
count (cupy.ndarray) – Running count of valid values
mean (cupy.ndarray) – Running mean
M2 (cupy.ndarray) – Running sum of squared differences
new_value (cupy.ndarray) – New values to incorporate
finite_threshold (float) – Maximum magnitude for inclusion
- Returns:
tuple
- Return type:
Updated (count, mean, M2)
- dire_rapids.metrics.welford_finalize_gpu(count, mean, M2)[source]
Finalize Welford’s algorithm to compute mean and std.
- Parameters:
count (cupy.ndarray) – Total count of valid values
mean (cupy.ndarray) – Computed mean
M2 (cupy.ndarray) – Sum of squared differences
- Returns:
tuple
- Return type:
(mean, std)
- dire_rapids.metrics.welford_gpu(data)[source]
GPU-accelerated computation of mean and std.
- Parameters:
data (cupy.ndarray or numpy.ndarray) – Input data
- Returns:
tuple
- Return type:
(mean, std)
- dire_rapids.metrics.threshold_subsample_gpu(data, layout, labels=None, threshold=0.5, random_state=42)[source]
GPU-accelerated Bernoulli subsampling of data.
- Parameters:
- Returns:
tuple
- Return type:
Subsampled arrays
- Raises:
ValueError – If threshold is not between 0.0 and 1.0
- dire_rapids.metrics.make_knn_graph_gpu(data, n_neighbors, batch_size=50000)[source]
GPU-accelerated kNN graph construction using cuML.
- Parameters:
- Returns:
tuple
- Return type:
(distances, indices) arrays of shape (n_samples, n_neighbors+1)
- dire_rapids.metrics.make_knn_graph_cpu(data, n_neighbors, batch_size=10000)[source]
CPU fallback for kNN graph construction.
- Parameters:
- Returns:
tuple
- Return type:
(distances, indices) arrays
- dire_rapids.metrics.compute_stress(data, layout, n_neighbors, eps=1e-06, use_gpu=True)[source]
Compute normalized stress (distortion) of an embedding.
This metric measures how well distances are preserved between the high-dimensional data and low-dimensional layout.
- Parameters:
data (array-like) – High-dimensional data (n_samples, n_features)
layout (array-like) – Low-dimensional embedding (n_samples, n_components)
n_neighbors (int) – Number of nearest neighbors to consider
eps (float) – Small constant to prevent division by zero
use_gpu (bool) – Whether to use GPU acceleration
- Returns:
float
- Return type:
Normalized stress value
- dire_rapids.metrics.compute_neighbor_score(data, layout, n_neighbors, use_gpu=True)[source]
Compute neighborhood preservation score.
Measures how well k-nearest neighbor relationships are preserved from high-dimensional to low-dimensional space.
- dire_rapids.metrics.compute_local_metrics(data, layout, n_neighbors, subsample_threshold=1.0, random_state=42, use_gpu=True)[source]
Compute local quality metrics (stress and neighborhood preservation).
- Parameters:
data (array-like) – High-dimensional data
layout (array-like) – Low-dimensional embedding
n_neighbors (int) – Number of neighbors for kNN graph
subsample_threshold (float) – Subsampling probability (must be between 0.0 and 1.0, default 1.0 = no subsampling)
random_state (int) – Random seed for subsampling
use_gpu (bool) – Whether to use GPU acceleration
- Returns:
dict
- Return type:
Dictionary containing ‘stress’ and ‘neighbor’ metrics
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0
- dire_rapids.metrics.compute_svm_accuracy(X, y, test_size=0.3, reg_param=1.0, max_iter=1000, random_state=42, use_gpu=True)[source]
Compute SVM classification accuracy.
- Parameters:
- Returns:
float
- Return type:
Classification accuracy
- dire_rapids.metrics.compute_knn_accuracy(X, y, n_neighbors=16, test_size=0.3, random_state=42, use_gpu=True)[source]
Compute kNN classification accuracy.
- dire_rapids.metrics.compute_svm_score(data, layout, labels, subsample_threshold=0.5, random_state=42, use_gpu=True, **kwargs)[source]
Compute SVM context preservation score.
Compares SVM classification accuracy on high-dimensional data vs low-dimensional embedding.
- Parameters:
data (array-like) – High-dimensional data
layout (array-like) – Low-dimensional embedding
labels (array-like) – Class labels
subsample_threshold (float) – Subsampling probability (must be between 0.0 and 1.0)
random_state (int) – Random seed
use_gpu (bool) – Whether to use GPU acceleration
**kwargs (dict) – Additional parameters for SVM
- Returns:
ndarray
- Return type:
[acc_hd, acc_ld, log_ratio]
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0
- dire_rapids.metrics.compute_knn_score(data, layout, labels, n_neighbors=16, subsample_threshold=0.5, random_state=42, use_gpu=True, **kwargs)[source]
Compute kNN context preservation score.
Compares kNN classification accuracy on high-dimensional data vs low-dimensional embedding.
- Parameters:
data (array-like) – High-dimensional data
layout (array-like) – Low-dimensional embedding
labels (array-like) – Class labels
n_neighbors (int) – Number of neighbors for kNN
subsample_threshold (float) – Subsampling probability (must be between 0.0 and 1.0)
random_state (int) – Random seed
use_gpu (bool) – Whether to use GPU acceleration
**kwargs (dict) – Additional parameters
- Returns:
ndarray
- Return type:
[acc_hd, acc_ld, log_ratio]
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0
- dire_rapids.metrics.compute_context_measures(data, layout, labels, subsample_threshold=0.5, n_neighbors=16, random_state=42, use_gpu=True, **kwargs)[source]
Compute context preservation measures (SVM and kNN).
- Parameters:
data (array-like) – High-dimensional data
layout (array-like) – Low-dimensional embedding
labels (array-like) – Class labels
subsample_threshold (float) – Subsampling probability (must be between 0.0 and 1.0)
n_neighbors (int) – Number of neighbors for kNN
random_state (int) – Random seed
use_gpu (bool) – Whether to use GPU acceleration
**kwargs (dict) – Additional parameters
- Returns:
dict
- Return type:
Dictionary with ‘svm’ and ‘knn’ scores
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0
- dire_rapids.metrics.compute_h0_h1_knn(data, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, use_gpu=True)[source]
Compute H0/H1 Betti numbers using local kNN atlas approach.
Build dense local triangulations around each point, then merge consistently. This avoids the “holes” problem of global sparse kNN graphs.
Automatically selects between GPU and CPU implementation based on availability and use_gpu parameter.
- Parameters:
data (array-like) – Point cloud data (n_samples, n_features)
k_neighbors (int) – Size of local neighborhood (default 20, recommended 15-20 for noisy data)
density_threshold (float) – Percentile threshold for edge inclusion (0-1). Lower = denser triangulation. Default 0.8 means edges up to 80th percentile of local distances are included.
overlap_factor (float) – Factor for expanding local neighborhoods to ensure overlap (default 1.5). Higher values create more dense, overlapping patches.
use_gpu (bool) – Whether to use GPU acceleration (if available)
- Returns:
tuple – Betti numbers: β₀ (connected components), β₁ (loops)
- Return type:
(beta_0, beta_1)
- dire_rapids.metrics.compute_global_metrics(data, layout, subsample_threshold=0.5, random_state=42, n_steps=100, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, use_gpu=False, metrics_only=True)[source]
Compute global topological metrics based on Betti curve comparison.
Computes Betti curves for high-dimensional data and low-dimensional embedding using the atlas approach, then compares them using fastDTW distance.
- Parameters:
data (array-like) – High-dimensional data
layout (array-like) – Low-dimensional embedding
subsample_threshold (float) – Subsampling probability (must be between 0.0 and 1.0)
random_state (int) – Random seed
n_steps (int) – Number of points for Betti curves
k_neighbors (int) – Size of local neighborhood for atlas approach (default 20)
density_threshold (float) – Percentile threshold for edge inclusion (0-1, default 0.8)
overlap_factor (float) – Factor for expanding local neighborhoods (default 1.5)
use_gpu (bool) – Whether to use GPU acceleration
metrics_only (bool) – If True, return only metrics; otherwise include betti curves
- Returns:
dict
- Return type:
Dictionary containing DTW distances for β₀ and β₁ curves
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0 If fastdtw is not available
- dire_rapids.metrics.evaluate_embedding(data, layout, labels=None, n_neighbors=16, subsample_threshold=0.5, random_state=42, use_gpu=True, compute_distortion=True, compute_context=True, compute_topology=True, **kwargs)[source]
Comprehensive evaluation of a dimensionality reduction embedding.
Computes distortion, context preservation, and topological metrics.
- Parameters:
data (array-like) – High-dimensional data (n_samples, n_features)
layout (array-like) – Low-dimensional embedding (n_samples, n_components)
labels (array-like, optional) – Class labels for context metrics
n_neighbors (int) – Number of neighbors for kNN metrics
subsample_threshold (float) – Subsampling probability for all metrics (must be between 0.0 and 1.0, default 0.5)
random_state (int) – Random seed
use_gpu (bool) – Whether to use GPU acceleration
compute_distortion (bool) – Whether to compute distortion metrics (default True)
compute_context (bool) – Whether to compute context metrics (default True)
compute_topology (bool) – Whether to compute topological metrics (default True)
**kwargs (dict) – Additional parameters for specific metrics
- Returns:
dict
- Return type:
Dictionary with all computed metrics
- Raises:
ValueError – If subsample_threshold is not between 0.0 and 1.0