dire_rapids.utils module

Utility classes and functions for dire-rapids package.

This module provides: - ReducerConfig: Configuration dataclass for dimensionality reduction algorithms - ReducerRunner: General-purpose runner for dimensionality reduction benchmarking - Dataset loading utilities for sklearn, cytof, DiRe geometric datasets, and more

dire_rapids.utils.rand_point_disk(n_features, n_samples=1, rng=None)[source]: Generate uniformly distributed points in n-dimensional unit disk.

dire_rapids.utils.rand_point_sphere(n_features, n_samples=1, rng=None)[source]: Generate uniformly distributed points on n-dimensional unit sphere.

class dire_rapids.utils.elgen(a)[source]

Bases: object

Ellipsoid generator - transforms sphere points to ellipsoid.

__init__(a)[source]

dire_rapids.utils.rand_point_ell(semi_axes, n_features, n_samples=1, rng=None)[source]: Generate uniformly distributed points on n-dimensional ellipsoid with semi-axes.

class dire_rapids.utils.ReducerConfig(name: str, reducer_class: type, reducer_kwargs: dict, visualize: bool = False, categorical_labels: bool = True, max_points: int = 10000)[source]

Bases: object

Configuration for a dimensionality reduction algorithm.

All fields are mutable and can be changed after creation:: config.visualize = True config.categorical_labels = False config.max_points = 20000

name: str

reducer_class: type

reducer_kwargs: dict

visualize: bool = False

categorical_labels: bool = True

max_points: int = 10000

__init__(name: str, reducer_class: type, reducer_kwargs: dict, visualize: bool = False, categorical_labels: bool = True, max_points: int = 10000) → None

class dire_rapids.utils.ReducerRunner(config: ReducerConfig)[source]

Bases: object

General-purpose runner for dimensionality reduction algorithms.

Supports: - DiRe (create_dire, DiRePyTorch, DiRePyTorchMemoryEfficient, DiReCuVS) - cuML (UMAP, TSNE) - scikit-learn (any TransformerMixin-compatible class)

Parameters:: config (ReducerConfig) – Configuration object containing reducer_class, reducer_kwargs, name, and visualize flag.

config: ReducerConfig

__post_init__()[source]: Validate that config is provided.

run(dataset, *, dataset_kwargs=None, transform=None)[source]

Run dimensionality reduction on specified dataset.

Parameters:

dataset (str) – Dataset selector (sklearn:name, openml:name, cytof:name, dire:name, file:path)
dataset_kwargs (dict, optional) – Arguments for dataset loader
transform (callable, optional) – Custom transform function (X, y) -> (X’, y’)

Returns:

Results containing: - embedding: reduced data - labels: data labels - reducer: fitted reducer instance - fit_time_sec: time taken for fit_transform - dataset_info: dataset metadata

Return type:

dict

static available_sklearn()[source]: Return available sklearn dataset loaders, fetchers, and generators.

static available_cytof()[source]: Return available CyTOF datasets.

__init__(config: ReducerConfig) → None