dire_rapids.dire_pytorch_memory_efficient module

Memory-efficient PyTorch/PyKeOps backend for DiRe.

This implementation inherits from DiRePyTorch and overrides specific methods for: - FP16 support for memory-efficient k-NN computation - Point-by-point attraction force computation to avoid large tensor materialization - More aggressive memory management and cache clearing - Optional PyKeOps LazyTensors for repulsion when available

class dire_rapids.dire_pytorch_memory_efficient.DiRePyTorchMemoryEfficient(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]

Bases: DiRePyTorch

Memory-optimized PyTorch implementation of DiRe for large-scale datasets.

This class extends DiRePyTorch with enhanced memory management capabilities, making it suitable for processing very large datasets that would otherwise cause out-of-memory errors with the standard implementation.

Key Improvements over DiRePyTorch

FP16 Support: Uses half-precision by default for 2x memory reduction
Dynamic Chunking: Automatically adjusts chunk sizes based on available memory
Aggressive Cleanup: More frequent garbage collection and cache clearing
PyKeOps Integration: Optional LazyTensors for memory-efficient exact repulsion
Memory Monitoring: Real-time memory usage tracking and warnings
Point-wise Processing: Falls back to point-by-point computation when needed

Best Use Cases

Datasets with >100K points
High-dimensional data (>500 features)
Memory-constrained environments
Production systems requiring reliable memory usage

param *args:: Positional arguments passed to DiRePyTorch parent class.
param use_fp16:: Enable FP16 precision for memory efficiency (recommended). Provides 2x memory reduction and significant speed improvements.
type use_fp16:: bool, default=True
param use_pykeops_repulsion:: Use PyKeOps LazyTensors for repulsion when beneficial. Automatically disabled if PyKeOps unavailable or dataset too large.
type use_pykeops_repulsion:: bool, default=True
param pykeops_threshold:: Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.
type pykeops_threshold:: int, default=50000
param memory_fraction:: Fraction of available memory to use for computations. Lower values are more conservative but may be slower.
type memory_fraction:: float, default=0.25
param **kwargs:: Additional keyword arguments passed to DiRePyTorch parent class. Includes: n_components, n_neighbors, init, max_iter_layout, min_dist, spread, cutoff, neg_ratio, verbose, random_state, use_exact_repulsion, metric (custom distance function for k-NN computation).

Examples

Memory-efficient processing of large dataset:

from dire_rapids import DiRePyTorchMemoryEfficient
import numpy as np

# Large dataset
X = np.random.randn(500000, 512)

# Memory-efficient reducer
reducer = DiRePyTorchMemoryEfficient(
    use_fp16=True,
    memory_fraction=0.3,
    verbose=True
)

embedding = reducer.fit_transform(X)

Custom memory settings:

reducer = DiRePyTorchMemoryEfficient(
    use_pykeops_repulsion=False,  # Disable PyKeOps
    memory_fraction=0.15,         # Use less memory
    pykeops_threshold=20000       # Lower PyKeOps threshold
)

With custom distance metric:

# L1 metric for k-NN with memory efficiency
reducer = DiRePyTorchMemoryEfficient(
    metric='(x - y).abs().sum(-1)',
    n_neighbors=32,
    use_fp16=True,
    memory_fraction=0.2
)

embedding = reducer.fit_transform(X)

__init__(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]

Initialize memory-efficient DiRe reducer.

Parameters:

*args – Positional arguments passed to DiRePyTorch parent class.
use_fp16 (bool, default=True) – Enable FP16 precision for memory efficiency. Provides 2x memory reduction and significant speed improvements on modern GPUs.
use_pykeops_repulsion (bool, default=True) – Use PyKeOps LazyTensors for memory-efficient repulsion computation when dataset size is below pykeops_threshold.
pykeops_threshold (int, default=50000) – Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.
memory_fraction (float, default=0.25) – Fraction of available memory to use for computations. Lower values are more conservative but may be slower.
**kwargs –
Additional keyword arguments passed to DiRePyTorch parent class. See DiRePyTorch documentation for available parameters including:
- n_components, n_neighbors, init, max_iter_layout, min_dist, spread
- cutoff, neg_ratio, verbose, random_state, use_exact_repulsion
- metric: Custom distance metric for k-NN (str, callable, or None)

fit_transform(X, y=None)[source]

Fit the model and transform data with memory-efficient processing.

This method extends the parent implementation with memory-optimized data handling, enhanced logging, and aggressive cleanup procedures.

Parameters:

X (array-like of shape (n_samples, n_features)) – High-dimensional input data to transform.
y (array-like of shape (n_samples,), optional) – Ignored. Present for scikit-learn API compatibility.

Returns:

Low-dimensional embedding of the input data.

Return type:

numpy.ndarray of shape (n_samples, n_components)

Notes

Memory Optimizations: - Automatic FP16 conversion for large datasets on GPU - Strategic backend selection based on dataset characteristics - Aggressive memory cleanup after processing - Real-time memory monitoring and reporting

Examples

Process large dataset with memory monitoring:

import numpy as np
from dire_rapids import DiRePyTorchMemoryEfficient

# Large high-dimensional dataset
X = np.random.randn(200000, 1000)

reducer = DiRePyTorchMemoryEfficient(
    use_fp16=True,
    memory_fraction=0.3,
    verbose=True
)

embedding = reducer.fit_transform(X)