dire_rapids.betti_curve module

Betti curve computation using filtered edge addition.

Computes Betti numbers (β₀, β₁) at multiple filtration thresholds by: 1. Building full atlas complex once with kNN graph 2. Recording edge and triangle filtration values 3. Progressively adding edges and triangles by threshold 4. Updating β₀ with union-find and rank(B₂) over GF(2) incrementally

Contains both CPU and GPU implementations with automatic backend selection.

class dire_rapids.betti_curve.IncrementalF2Rank[source]

Bases: object

Incremental sparse rank over GF(2), represented as Python integer bitsets.

__init__()[source]
add(column)[source]

Add one boundary column. Return True if it increases rank.

class dire_rapids.betti_curve.UnionFind(n)[source]

Bases: object

Disjoint-set with path compression and union by rank.

__init__(n)[source]
find(x)[source]
union(x, y)[source]
dire_rapids.betti_curve.compute_betti_curve_cpu(data, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, n_steps=50)[source]

CPU implementation of filtered Betti curve computation.

Parameters:
  • data (array-like) – Point cloud data (n_samples, n_features)

  • k_neighbors (int) – Size of local neighborhood

  • density_threshold (float) – Percentile threshold for edge inclusion (0-1)

  • overlap_factor (float) – Factor for expanding local neighborhoods

  • n_steps (int) – Number of filtration steps

Returns:

  • dict ({) – ‘filtration_values’: array of filtration thresholds, ‘beta_0’: array of H0 Betti numbers, ‘beta_1’: array of H1 Betti numbers, ‘n_edges_active’: array of active edge counts, ‘n_triangles_active’: array of active triangle counts

  • }

dire_rapids.betti_curve.compute_betti_curve_fast(data, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, n_steps=50)[source]

Fast Betti curve computation using an incremental atlas filtration.

Adds each edge and triangle once in filtration order:

β₀ = connected components via union-find, O(E·α(E)) β₁ = E - rank(B1) - rank(B2) rank(B1) = V - β₀ rank(B2) is maintained incrementally over GF(2) using sparse bitsets

Parameters:
  • data (array-like) – Point cloud data (n_samples, n_features)

  • k_neighbors (int) – Size of local neighborhood

  • density_threshold (float) – Percentile threshold for edge inclusion (0-1)

  • overlap_factor (float) – Factor for expanding local neighborhoods

  • n_steps (int) – Number of filtration steps

Returns:

dict

Return type:

Same structure as compute_betti_curve_cpu

dire_rapids.betti_curve.compute_betti_curve_gpu(data, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, n_steps=50)[source]

GPU implementation of filtered Betti curve computation.

Uses GPU kNN (cuVS/cuML) for graph construction, then the shared incremental atlas computation (union-find for β₀ and GF(2) bitset elimination for rank(B₂)). Atlas building runs on CPU because the set-heavy merge step is faster and simpler there.

Parameters:
  • data (array-like) – Point cloud data (n_samples, n_features)

  • k_neighbors (int) – Size of local neighborhood

  • density_threshold (float) – Percentile threshold for edge inclusion (0-1)

  • overlap_factor (float) – Factor for expanding local neighborhoods

  • n_steps (int) – Number of filtration steps

Returns:

dict

Return type:

Same structure as compute_betti_curve_cpu

dire_rapids.betti_curve.compute_betti_curve_ripser(data, n_steps=50, maxdim=1, thresh=None)[source]

Betti curves via ripser’s persistent-homology reduction.

Much faster than the eigsh and dense-SVD paths at every scale: empirically ~4000× on MNIST n=3000 (eigsh path hangs for hours; ripser: ~6 s). The cost grows roughly O(N²) for maxdim=1 and O(N³) for maxdim=2, but the constants are small thanks to cohomology + apparent-pairs sparsification.

Persistent-homology view: ripser returns, for each dimension, a list of (birth, death) intervals. The Betti curve is β_d(t) = #{intervals with birth ≤ t < death} — “number of d-cycles alive at filtration t”.

Parameters:
  • data (array-like of shape (n_samples, n_features)) – Point cloud.

  • n_steps (int, default=50) – Number of filtration samples at which to evaluate each Betti curve.

  • maxdim (int, default=1) – Highest homology dimension to compute. maxdim=1 returns β_0 and β_1; maxdim=2 additionally computes β_2 at O(N³) cost.

  • thresh (float or None, default=None) – Maximum filtration radius. None lets ripser run to its default (≈ the enclosing radius of the data). Cap this on large data if you only care about small-scale topology.

Returns:

dict

‘filtration_values’n_steps evenly spaced values from 0 to the

longest finite death time seen (plus a 5% margin).

’beta_0’, ‘beta_1’ : length-n_steps arrays. ‘beta_2’ : present only if maxdim >= 2. ‘n_edges_active’, ‘n_triangles_active’ : filled with NaN — ripser

does not expose these. Callers depending on the counts should use compute_betti_curve_fast instead.

Return type:

same shape as the other compute_betti_curve_* backends.

Notes

Requires ripser: install via pip install dire-rapids[persistent].

dire_rapids.betti_curve.compute_betti_curve(data, k_neighbors=20, density_threshold=0.8, overlap_factor=1.5, n_steps=50, use_gpu=True, prefer_ripser=True)[source]

Compute filtered Betti curve (backend selector).

Preference order (if available):

ripser (if prefer_ripser and ripser is installed) → GPU kNN + incremental atlas path (compute_betti_curve_gpu) if

use_gpu

→ fast CPU incremental atlas path (compute_betti_curve_fast) as

fallback.

Ripser is the preferred default when available; the atlas paths provide a dependency-light fallback for environments without ripser.

Parameters:
  • data (array-like) – Point cloud data (n_samples, n_features).

  • k_neighbors (int) – Size of local neighborhood (ignored by the ripser path, which uses Vietoris-Rips filtration on the full distance matrix up to thresh).

  • density_threshold (float) – Percentile threshold for edge inclusion, 0-1 (ignored by ripser).

  • overlap_factor (float) – Factor for expanding local neighborhoods (ignored by ripser).

  • n_steps (int) – Number of filtration steps.

  • use_gpu (bool) – Whether to try the GPU kNN atlas path if ripser is not picked.

  • prefer_ripser (bool) – Prefer ripser when available. Set to False to skip ripser and use the atlas backends only (e.g. for correctness comparison).

Returns:

  • dict ({) – ‘filtration_values’: array of filtration thresholds, ‘beta_0’: array of H0 Betti numbers, ‘beta_1’: array of H1 Betti numbers, ‘n_edges_active’: array of active edge counts (NaN under ripser), ‘n_triangles_active’: array of active triangle counts (NaN under ripser)

  • }