dire_rapids.dire_pytorch_memory_efficient module

Memory-efficient PyTorch/PyKeOps backend for DiRe.

This implementation inherits from DiRePyTorch and overrides specific methods for: - FP16 support for memory-efficient k-NN computation - Point-by-point attraction force computation to avoid large tensor materialization - More aggressive memory management and cache clearing - Optional PyKeOps LazyTensors for repulsion when available

class dire_rapids.dire_pytorch_memory_efficient.DiRePyTorchMemoryEfficient(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]

Bases: DiRePyTorch

Memory-optimized PyTorch implementation of DiRe for large-scale datasets.

This class extends DiRePyTorch with enhanced memory management capabilities, making it suitable for processing very large datasets that would otherwise cause out-of-memory errors with the standard implementation.

Key Improvements over DiRePyTorch

  • FP16 Support: Uses half-precision by default for 2x memory reduction

  • Dynamic Chunking: Automatically adjusts chunk sizes based on available memory

  • Aggressive Cleanup: More frequent garbage collection and cache clearing

  • PyKeOps Integration: Optional LazyTensors for memory-efficient exact repulsion

  • Memory Monitoring: Real-time memory usage tracking and warnings

  • Point-wise Processing: Falls back to point-by-point computation when needed

Best Use Cases

  • Datasets with >100K points

  • High-dimensional data (>500 features)

  • Memory-constrained environments

  • Production systems requiring reliable memory usage

param *args:

Positional arguments passed to DiRePyTorch parent class.

param use_fp16:

Enable FP16 precision for memory efficiency (recommended). Provides 2x memory reduction and significant speed improvements.

type use_fp16:

bool, default=True

param use_pykeops_repulsion:

Use PyKeOps LazyTensors for repulsion when beneficial. Automatically disabled if PyKeOps unavailable or dataset too large.

type use_pykeops_repulsion:

bool, default=True

param pykeops_threshold:

Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.

type pykeops_threshold:

int, default=50000

param memory_fraction:

Fraction of available memory to use for computations. Lower values are more conservative but may be slower.

type memory_fraction:

float, default=0.25

param **kwargs:

Additional keyword arguments passed to DiRePyTorch parent class. Includes: n_components, n_neighbors, init, max_iter_layout, min_dist, spread, cutoff, neg_ratio, verbose, random_state, use_exact_repulsion, metric (custom distance function for k-NN computation).

Examples

Memory-efficient processing of large dataset:

from dire_rapids import DiRePyTorchMemoryEfficient
import numpy as np

# Large dataset
X = np.random.randn(500000, 512)

# Memory-efficient reducer
reducer = DiRePyTorchMemoryEfficient(
    use_fp16=True,
    memory_fraction=0.3,
    verbose=True
)

embedding = reducer.fit_transform(X)

Custom memory settings:

reducer = DiRePyTorchMemoryEfficient(
    use_pykeops_repulsion=False,  # Disable PyKeOps
    memory_fraction=0.15,         # Use less memory
    pykeops_threshold=20000       # Lower PyKeOps threshold
)

With custom distance metric:

# L1 metric for k-NN with memory efficiency
reducer = DiRePyTorchMemoryEfficient(
    metric='(x - y).abs().sum(-1)',
    n_neighbors=32,
    use_fp16=True,
    memory_fraction=0.2
)

embedding = reducer.fit_transform(X)
__init__(*args, use_fp16=True, use_pykeops_repulsion=True, pykeops_threshold=50000, memory_fraction=0.25, **kwargs)[source]

Initialize memory-efficient DiRe reducer.

Parameters:
  • *args – Positional arguments passed to DiRePyTorch parent class.

  • use_fp16 (bool, default=True) – Enable FP16 precision for memory efficiency. Provides 2x memory reduction and significant speed improvements on modern GPUs.

  • use_pykeops_repulsion (bool, default=True) – Use PyKeOps LazyTensors for memory-efficient repulsion computation when dataset size is below pykeops_threshold.

  • pykeops_threshold (int, default=50000) – Maximum dataset size for PyKeOps all-pairs computation. Above this threshold, random sampling is used instead.

  • memory_fraction (float, default=0.25) – Fraction of available memory to use for computations. Lower values are more conservative but may be slower.

  • **kwargs

    Additional keyword arguments passed to DiRePyTorch parent class. See DiRePyTorch documentation for available parameters including:

    • n_components, n_neighbors, init, max_iter_layout, min_dist, spread

    • cutoff, neg_ratio, verbose, random_state, use_exact_repulsion

    • metric: Custom distance metric for k-NN (str, callable, or None)

fit_transform(X, y=None)[source]

Fit the model and transform data with memory-efficient processing.

This method extends the parent implementation with memory-optimized data handling, enhanced logging, and aggressive cleanup procedures.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – High-dimensional input data to transform.

  • y (array-like of shape (n_samples,), optional) – Ignored. Present for scikit-learn API compatibility.

Returns:

Low-dimensional embedding of the input data.

Return type:

numpy.ndarray of shape (n_samples, n_components)

Notes

Memory Optimizations: - Automatic FP16 conversion for large datasets on GPU - Strategic backend selection based on dataset characteristics - Aggressive memory cleanup after processing - Real-time memory monitoring and reporting

Examples

Process large dataset with memory monitoring:

import numpy as np
from dire_rapids import DiRePyTorchMemoryEfficient

# Large high-dimensional dataset
X = np.random.randn(200000, 1000)

reducer = DiRePyTorchMemoryEfficient(
    use_fp16=True,
    memory_fraction=0.3,
    verbose=True
)

embedding = reducer.fit_transform(X)