dire_rapids.utils module

Utility classes and functions for dire-rapids package.

This module provides: - ReducerConfig: Configuration dataclass for dimensionality reduction algorithms - ReducerRunner: General-purpose runner for dimensionality reduction benchmarking - Dataset loading utilities for sklearn, cytof, DiRe geometric datasets, and more

dire_rapids.utils.rand_point_disk(n_features, n_samples=1, rng=None)[source]

Generate uniformly distributed points in n-dimensional unit disk.

dire_rapids.utils.rand_point_sphere(n_features, n_samples=1, rng=None)[source]

Generate uniformly distributed points on n-dimensional unit sphere.

class dire_rapids.utils.elgen(a)[source]

Bases: object

Ellipsoid generator - transforms sphere points to ellipsoid.

__init__(a)[source]
dire_rapids.utils.rand_point_ell(semi_axes, n_features, n_samples=1, rng=None)[source]

Generate uniformly distributed points on n-dimensional ellipsoid with semi-axes.

class dire_rapids.utils.ReducerConfig(name: str, reducer_class: type, reducer_kwargs: dict, visualize: bool = False, categorical_labels: bool = True, max_points: int = 10000)[source]

Bases: object

Configuration for a dimensionality reduction algorithm.

All fields are mutable and can be changed after creation:

config.visualize = True config.categorical_labels = False config.max_points = 20000

name: str
reducer_class: type
reducer_kwargs: dict
visualize: bool = False
categorical_labels: bool = True
max_points: int = 10000
__init__(name: str, reducer_class: type, reducer_kwargs: dict, visualize: bool = False, categorical_labels: bool = True, max_points: int = 10000) None
class dire_rapids.utils.ReducerRunner(config: ReducerConfig)[source]

Bases: object

General-purpose runner for dimensionality reduction algorithms.

Supports: - DiRe (create_dire, DiRePyTorch, DiRePyTorchMemoryEfficient, DiReCuVS) - cuML (UMAP, TSNE) - scikit-learn (any TransformerMixin-compatible class)

Parameters:

config (ReducerConfig) – Configuration object containing reducer_class, reducer_kwargs, name, and visualize flag.

config: ReducerConfig
__post_init__()[source]

Validate that config is provided.

run(dataset, *, dataset_kwargs=None, transform=None)[source]

Run dimensionality reduction on specified dataset.

Parameters:
  • dataset (str) – Dataset selector (sklearn:name, openml:name, cytof:name, dire:name, file:path)

  • dataset_kwargs (dict, optional) – Arguments for dataset loader

  • transform (callable, optional) – Custom transform function (X, y) -> (X’, y’)

Returns:

Results containing: - embedding: reduced data - labels: data labels - reducer: fitted reducer instance - fit_time_sec: time taken for fit_transform - dataset_info: dataset metadata

Return type:

dict

static available_sklearn()[source]

Return available sklearn dataset loaders, fetchers, and generators.

static available_cytof()[source]

Return available CyTOF datasets.

__init__(config: ReducerConfig) None