dire_rapids.dire_pytorch_memory_efficient module
Memory-efficient PyTorch/PyKeOps backend for DiRe.
This implementation inherits from DiRePyTorch and overrides specific methods for: - FP16 support for memory-efficient k-NN computation - Point-by-point attraction force computation to avoid large tensor materialization - More aggressive memory management and cache clearing - Optional PyKeOps LazyTensors for repulsion when available
- class dire_rapids.dire_pytorch_memory_efficient.DiRePyTorchMemoryEfficient(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]
Bases:
DiRePyTorch
Memory-optimized PyTorch implementation of DiRe for large-scale datasets.
This class extends DiRePyTorch with enhanced memory management capabilities, making it suitable for processing very large datasets that would otherwise cause out-of-memory errors with the standard implementation.
Key Improvements over DiRePyTorch
FP16 Support: Uses half-precision by default for 2x memory reduction
Dynamic Chunking: Automatically adjusts chunk sizes based on available memory
Aggressive Cleanup: More frequent garbage collection and cache clearing
PyKeOps Integration: Optional LazyTensors for memory-efficient exact repulsion
Memory Monitoring: Real-time memory usage tracking and warnings
Point-wise Processing: Falls back to point-by-point computation when needed
Best Use Cases
Datasets with >100K points
High-dimensional data (>500 features)
Memory-constrained environments
Production systems requiring reliable memory usage
- param *args:
Positional arguments passed to DiRePyTorch parent class.
- param use_fp16:
Enable FP16 precision for memory efficiency (recommended). Provides 2x memory reduction and significant speed improvements.
- type use_fp16:
bool, default=True
- param use_pykeops_repulsion:
Use PyKeOps LazyTensors for repulsion when beneficial. Automatically disabled if PyKeOps unavailable or dataset too large.
- type use_pykeops_repulsion:
bool, default=True
- param pykeops_threshold:
Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.
- type pykeops_threshold:
int, default=50000
- param memory_fraction:
Fraction of available memory to use for computations. Lower values are more conservative but may be slower.
- type memory_fraction:
float, default=0.25
- param **kwargs:
Additional keyword arguments passed to DiRePyTorch parent class. Includes: n_components, n_neighbors, init, max_iter_layout, min_dist, spread, cutoff, neg_ratio, verbose, random_state, use_exact_repulsion, metric (custom distance function for k-NN computation).
Examples
Memory-efficient processing of large dataset:
from dire_rapids import DiRePyTorchMemoryEfficient import numpy as np # Large dataset X = np.random.randn(500000, 512) # Memory-efficient reducer reducer = DiRePyTorchMemoryEfficient( use_fp16=True, memory_fraction=0.3, verbose=True ) embedding = reducer.fit_transform(X)
Custom memory settings:
reducer = DiRePyTorchMemoryEfficient( use_pykeops_repulsion=False, # Disable PyKeOps memory_fraction=0.15, # Use less memory pykeops_threshold=20000 # Lower PyKeOps threshold )
With custom distance metric:
# L1 metric for k-NN with memory efficiency reducer = DiRePyTorchMemoryEfficient( metric='(x - y).abs().sum(-1)', n_neighbors=32, use_fp16=True, memory_fraction=0.2 ) embedding = reducer.fit_transform(X)
- __init__(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]
Initialize memory-efficient DiRe reducer.
- Parameters:
*args – Positional arguments passed to DiRePyTorch parent class.
use_fp16 (bool, default=True) – Enable FP16 precision for memory efficiency. Provides 2x memory reduction and significant speed improvements on modern GPUs.
use_pykeops_repulsion (bool, default=True) – Use PyKeOps LazyTensors for memory-efficient repulsion computation when dataset size is below pykeops_threshold.
pykeops_threshold (int, default=50000) – Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.
memory_fraction (float, default=0.25) – Fraction of available memory to use for computations. Lower values are more conservative but may be slower.
**kwargs –
Additional keyword arguments passed to DiRePyTorch parent class. See DiRePyTorch documentation for available parameters including:
n_components, n_neighbors, init, max_iter_layout, min_dist, spread
cutoff, neg_ratio, verbose, random_state, use_exact_repulsion
metric: Custom distance metric for k-NN (str, callable, or None)
- fit_transform(X, y=None)[source]
Fit the model and transform data with memory-efficient processing.
This method extends the parent implementation with memory-optimized data handling, enhanced logging, and aggressive cleanup procedures.
- Parameters:
X (array-like of shape (n_samples, n_features)) – High-dimensional input data to transform.
y (array-like of shape (n_samples,), optional) – Ignored. Present for scikit-learn API compatibility.
- Returns:
Low-dimensional embedding of the input data.
- Return type:
numpy.ndarray of shape (n_samples, n_components)
Notes
Memory Optimizations: - Automatic FP16 conversion for large datasets on GPU - Strategic backend selection based on dataset characteristics - Aggressive memory cleanup after processing - Real-time memory monitoring and reporting
Examples
Process large dataset with memory monitoring:
import numpy as np from dire_rapids import DiRePyTorchMemoryEfficient # Large high-dimensional dataset X = np.random.randn(200000, 1000) reducer = DiRePyTorchMemoryEfficient( use_fp16=True, memory_fraction=0.3, verbose=True ) embedding = reducer.fit_transform(X)