API Reference
Core Functions
- graphem_rapids.create_graphem(adjacency, n_components=2, backend=None, **kwargs)[source]
Create a GraphEmbedder with automatic backend selection.
- Parameters:
adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise.
n_components (int, default=2) – Number of components (dimensions) in the embedding.
backend (str, optional) – Force specific backend (‘pytorch’, ‘cuvs’, ‘auto’). If None, automatically selects optimal backend.
**kwargs – Additional arguments passed to the embedder constructor.
- Returns:
embedder – Graph embedder instance with optimal backend.
- Return type:
GraphEmbedder
Examples
>>> import graphem_rapids as gr >>> # Generate sparse adjacency matrix >>> adjacency = gr.generate_er(n=500, p=0.01) >>> embedder = gr.create_graphem(adjacency, n_components=3) >>> embedder.run_layout(num_iterations=50) >>> embedder.display_layout()
Embedders
PyTorch Backend
- class graphem_rapids.GraphEmbedderPyTorch(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]
Bases:
objectPyTorch-based graph embedder with CUDA acceleration.
This class provides graph embedding using Laplacian initialization followed by force-directed layout optimization, implemented with PyTorch for GPU acceleration and memory efficiency.
- adjacency
Sparse adjacency matrix (n_vertices × n_vertices).
- Type:
- edges
Edge list extracted from adjacency matrix as (n_edges, 2) tensor.
- Type:
- n
Number of vertices in the graph.
- Type:
- n_components
Number of components (dimensions) in the embedding space.
- Type:
- device
Computing device (CPU or CUDA).
- Type:
- positions
Current vertex positions as (n_vertices, n_components) tensor.
- Type:
- __init__(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]
Initialize the PyTorch GraphEmbedder.
- Parameters:
adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise. For weighted graphs, contains edge weights (future support).
n_components (int, default=2) – Number of components (dimensions) in the embedding.
device (str or torch.device, optional) – Computing device. If None, automatically selects GPU if available.
dtype (torch.dtype, default=torch.float32) – Data type for computations.
L_min (float, default=1.0) – Minimum spring length.
k_attr (float, default=0.2) – Attraction force constant.
k_inter (float, default=0.5) – Intersection repulsion force constant.
n_neighbors (int, default=10) – Number of nearest neighbors for intersection detection.
sample_size (int, default=256) – Sample size for kNN computation.
batch_size (int, optional) – Batch size for processing. If None, automatically selects based on available memory. Can be manually set (e.g., batch_size=1024) for custom memory management.
memory_efficient (bool, default=True) – Use memory-efficient algorithms for large graphs.
verbose (bool, default=True) – Enable verbose logging.
logger_instance (logging.Logger, optional) – Custom logger instance.
seed (int, optional) – Random seed for reproducibility. If provided, sets both numpy and torch seeds.
- property positions
Get positions as numpy array for API consistency.
- update_positions()[source]
Update vertex positions based on computed forces.
- run_layout(num_iterations=100)[source]
Run the force-directed layout algorithm.
- Parameters:
num_iterations (int, default=100) – Number of iterations to run.
- Returns:
Final vertex positions.
- Return type:
- get_positions()[source]
Get vertex positions as numpy array.
- Returns:
Vertex positions.
- Return type:
np.ndarray
RAPIDS cuVS Backend
Graph Generators
Random Graphs
- graphem_rapids.generate_er(n, p, seed=0)[source]
Generate a random undirected graph using the Erdős–Rényi G(n, p) model.
- Parameters:
n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_random_regular(n=100, d=3, seed=0)[source]
Generate a random regular graph where each node has degree d.
- Parameters:
n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
Scale-Free and Small-World
- graphem_rapids.generate_ba(n=300, m=3, seed=0)[source]
Generate a Barabási-Albert preferential attachment graph.
- Parameters:
n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]
Generate a Watts-Strogatz small-world graph.
- Parameters:
n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]
Generate a scale-free graph using Holme and Kim algorithm.
- Parameters:
n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]
Generate a powerlaw cluster graph.
- Parameters:
n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
Community Structures
- graphem_rapids.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]
Generate a stochastic block model graph.
- Parameters:
n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- labels: np.ndarray of shape (n,) (only if labels=True)
Block labels for each vertex.
- Return type:
adjacency
- graphem_rapids.generate_caveman(l=10, k=10)[source]
Generate a caveman graph with l cliques of size k.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]
Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
Bipartite Graphs
- graphem_rapids.generate_bipartite_graph(n_top=50, n_bottom=100, p=0.1, seed=0)[source]
Generate a random bipartite graph.
- Parameters:
n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.
p – float Probability of edge between any vertex in top set and any vertex in bottom set.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_complete_bipartite_graph(n_top=50, n_bottom=100)[source]
Generate a complete bipartite graph.
In a complete bipartite graph, every vertex in the top set is connected to every vertex in the bottom set, resulting in n_top * n_bottom edges.
- Parameters:
n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
Geometric Graphs
- graphem_rapids.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]
Generate a random geometric graph in a unit cube.
- Parameters:
n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
- graphem_rapids.generate_delaunay_triangulation(n=100, seed=0)[source]
Generate a Delaunay triangulation graph.
Vertices are randomly placed in a 2D unit square, and edges are created based on the Delaunay triangulation of these points. The resulting graph has planar structure with triangular faces.
- Parameters:
n – int Number of vertices.
seed – int Random seed for reproducibility.
- Returns:
- scipy.sparse.csr_matrix
Sparse adjacency matrix (n × n).
- Return type:
adjacency
Tree Structures
Influence Maximization
- graphem_rapids.graphem_seed_selection(embedder, k, num_iterations=20)[source]
Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.
- Parameters:
embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run
- Returns:
- list
List of k vertices selected as seeds
- Return type:
seeds
- graphem_rapids.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]
Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.
- Parameters:
G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations
- Returns:
- float
The estimated influence (average number of influenced nodes)
- iterations: int
The number of iterations run
- Return type:
influence
- graphem_rapids.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]
Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.
- Returns:
the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
- Return type:
seeds
Benchmarking
- graphem_rapids.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]
Run a benchmark to calculate correlations between embedding radii and centrality measures.
- Parameters:
graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
dim (int, default=2) – Number of embedding components (dimensions)
L_min (float) – Force-directed layout parameters
k_attr (float) – Force-directed layout parameters
k_inter (float) – Force-directed layout parameters
n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection
sample_size (int, default=512) – Sample size for kNN computation
num_iterations (int, default=40) – Number of layout iterations
backend (str, default='pytorch') – Backend to use
**kwargs – Additional embedder parameters
- Returns:
Benchmark results with correlation coefficients for centrality measures
- Return type:
- graphem_rapids.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]
Run a benchmark on the given graph using GraphEm Rapids.
- Parameters:
graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
dim (int, default=3) – Number of embedding components (dimensions)
L_min (float, default=10.0) – Minimum spring length parameter
k_attr (float, default=0.5) – Attraction force constant
k_inter (float, default=0.1) – Intersection repulsion force constant
n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection
sample_size (int, default=512) – Sample size for kNN computation
num_iterations (int, default=40) – Number of layout iterations
backend (str, default='pytorch') – Backend to use (‘pytorch’, ‘cuvs’, or ‘auto’)
**kwargs – Additional parameters passed to the embedder
- Returns:
Benchmark results including timings and graph metrics
- Return type:
- graphem_rapids.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None, backend='pytorch')[source]
Run a benchmark comparing influence maximization methods.
- Parameters:
graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
k (int, default=10) – Number of seed nodes to select for influence maximization
p (float, default=0.1) – Propagation probability for influence diffusion
iterations (int, default=200) – Number of Monte Carlo iterations for influence estimation
dim (int, default=3) – Number of embedding components (dimensions)
num_layout_iterations (int, default=20) – Number of force-directed layout iterations
layout_params (dict, optional) – Additional parameters for embedder initialization (L_min, k_attr, k_inter, etc.)
backend (str, default='pytorch') – Backend to use for embedding (‘pytorch’, ‘cuvs’, or ‘auto’)
- Returns:
Benchmark results comparing GraphEm vs Greedy seed selection with influence metrics
- Return type:
Visualization
- graphem_rapids.report_corr(name, radii, centrality, alpha=0.025)[source]
Calculate and report the correlation between radial distances and a centrality measure.
- Parameters:
name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval
- Returns:
(correlation coefficient, p-value)
- Return type:
- graphem_rapids.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]
Calculate and report correlations between radial distances and various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval
- Returns:
Correlation matrix
- Return type:
pandas.DataFrame
- graphem_rapids.plot_radial_vs_centrality(radii, centralities, names)[source]
Plot scatter plots of radial distances vs. various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures
Datasets
Utilities
Backend Selection
- class graphem_rapids.utils.backend_selection.BackendConfig(n_vertices, n_components=2, force_backend=None, prefer_gpu=True, memory_limit=None, verbose=True)[source]
Configuration for backend selection.
- graphem_rapids.utils.backend_selection.get_optimal_backend(config)[source]
Select optimal backend based on configuration and hardware.
- Parameters:
config (BackendConfig) – Backend configuration.
- Returns:
Optimal backend name (‘pytorch’, ‘cuvs’, ‘cpu’).
- Return type:
Memory Management
- class graphem_rapids.utils.memory_management.MemoryManager(cleanup_on_exit=True)[source]
Context manager for memory management.
- graphem_rapids.utils.memory_management.get_gpu_memory_info()[source]
Get GPU memory information.
- Returns:
GPU memory info with keys ‘total’, ‘allocated’, ‘cached’, ‘free’ in GB.
- Return type:
- graphem_rapids.utils.memory_management.get_optimal_chunk_size(n_vertices, n_components, available_memory_gb=None, safety_factor=0.7, backend='torch')[source]
Calculate optimal chunk size for memory-efficient processing.
- Parameters:
n_vertices (int) – Number of vertices in the graph.
n_components (int) – Embedding n_components.
available_memory_gb (float, optional) – Available GPU memory in GB. If None, automatically detected.
safety_factor (float, default=0.7) – Safety factor to avoid OOM (0-1).
backend (str, default='torch') – Backend type (‘torch’, ‘pykeops’, ‘cuvs’).
- Returns:
Optimal chunk size.
- Return type: