API Reference

Core Functions

graphem_rapids.create_graphem(adjacency, n_components=2, backend=None, **kwargs)[source]

Create a GraphEmbedder with automatic backend selection.

Parameters:
  • adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise.

  • n_components (int, default=2) – Number of components (dimensions) in the embedding.

  • backend (str, optional) – Force specific backend (‘pytorch’, ‘cuvs’, ‘auto’). If None, automatically selects optimal backend.

  • **kwargs – Additional arguments passed to the embedder constructor.

Returns:

embedder – Graph embedder instance with optimal backend.

Return type:

GraphEmbedder

Examples

>>> import graphem_rapids as gr
>>> # Generate sparse adjacency matrix
>>> adjacency = gr.generate_er(n=500, p=0.01)
>>> embedder = gr.create_graphem(adjacency, n_components=3)
>>> embedder.run_layout(num_iterations=50)
>>> embedder.display_layout()
graphem_rapids.get_backend_info()[source]

Get information about available backends.

Returns:

Dictionary with backend availability and hardware info.

Return type:

dict

Embedders

PyTorch Backend

class graphem_rapids.GraphEmbedderPyTorch(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]

Bases: object

PyTorch-based graph embedder with CUDA acceleration.

This class provides graph embedding using Laplacian initialization followed by force-directed layout optimization, implemented with PyTorch for GPU acceleration and memory efficiency.

adjacency

Sparse adjacency matrix (n_vertices × n_vertices).

Type:

scipy.sparse.csr_matrix

edges

Edge list extracted from adjacency matrix as (n_edges, 2) tensor.

Type:

torch.Tensor

n

Number of vertices in the graph.

Type:

int

n_components

Number of components (dimensions) in the embedding space.

Type:

int

device

Computing device (CPU or CUDA).

Type:

torch.device

positions

Current vertex positions as (n_vertices, n_components) tensor.

Type:

torch.Tensor

__init__(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]

Initialize the PyTorch GraphEmbedder.

Parameters:
  • adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise. For weighted graphs, contains edge weights (future support).

  • n_components (int, default=2) – Number of components (dimensions) in the embedding.

  • device (str or torch.device, optional) – Computing device. If None, automatically selects GPU if available.

  • dtype (torch.dtype, default=torch.float32) – Data type for computations.

  • L_min (float, default=1.0) – Minimum spring length.

  • k_attr (float, default=0.2) – Attraction force constant.

  • k_inter (float, default=0.5) – Intersection repulsion force constant.

  • n_neighbors (int, default=10) – Number of nearest neighbors for intersection detection.

  • sample_size (int, default=256) – Sample size for kNN computation.

  • batch_size (int, optional) – Batch size for processing. If None, automatically selects based on available memory. Can be manually set (e.g., batch_size=1024) for custom memory management.

  • memory_efficient (bool, default=True) – Use memory-efficient algorithms for large graphs.

  • verbose (bool, default=True) – Enable verbose logging.

  • logger_instance (logging.Logger, optional) – Custom logger instance.

  • seed (int, optional) – Random seed for reproducibility. If provided, sets both numpy and torch seeds.

property positions

Get positions as numpy array for API consistency.

update_positions()[source]

Update vertex positions based on computed forces.

run_layout(num_iterations=100)[source]

Run the force-directed layout algorithm.

Parameters:

num_iterations (int, default=100) – Number of iterations to run.

Returns:

Final vertex positions.

Return type:

torch.Tensor

get_positions()[source]

Get vertex positions as numpy array.

Returns:

Vertex positions.

Return type:

np.ndarray

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:
  • edge_width (float, default=1) – Width of the edges.

  • node_size (float, default=3) – Size of the nodes.

  • node_colors (array-like, optional) – Colors for each vertex.

RAPIDS cuVS Backend

graphem_rapids.GraphEmbedderCuVS

alias of None

Graph Generators

Random Graphs

graphem_rapids.generate_er(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:
  • n – int Number of vertices.

  • p – float Probability that an edge exists between any pair of vertices.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:
  • n – int Number of vertices.

  • d – int Degree of each vertex.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Scale-Free and Small-World

graphem_rapids.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of edges to attach from a new vertex to existing vertices.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:
  • n – int Number of vertices.

  • k – int Each vertex is connected to k nearest neighbors in ring topology.

  • p – float Probability of rewiring each edge.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:
  • n – int Number of vertices.

  • alpha – float Parameters for the scale-free graph generation.

  • beta – float Parameters for the scale-free graph generation.

  • gamma – float Parameters for the scale-free graph generation.

  • delta_in – float Parameters for the scale-free graph generation.

  • delta_out – float Parameters for the scale-free graph generation.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of random edges to add per new vertex.

  • p – float Probability of adding a triangle after adding a random edge.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Community Structures

graphem_rapids.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:
  • n_per_block – int Number of vertices per block.

  • num_blocks – int Number of blocks.

  • p_in – float Probability of edge within a block.

  • p_out – float Probability of edge between blocks.

  • labels – bool If True, return vertex labels.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

labels: np.ndarray of shape (n,) (only if labels=True)

Block labels for each vertex.

Return type:

adjacency

graphem_rapids.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

  • p – float Rewiring probability.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Bipartite Graphs

graphem_rapids.generate_bipartite_graph(n_top=50, n_bottom=100, p=0.1, seed=0)[source]

Generate a random bipartite graph.

Parameters:
  • n_top – int Number of vertices in the top set.

  • n_bottom – int Number of vertices in the bottom set.

  • p – float Probability of edge between any vertex in top set and any vertex in bottom set.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_complete_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a complete bipartite graph.

In a complete bipartite graph, every vertex in the top set is connected to every vertex in the bottom set, resulting in n_top * n_bottom edges.

Parameters:
  • n_top – int Number of vertices in the top set.

  • n_bottom – int Number of vertices in the bottom set.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Geometric Graphs

graphem_rapids.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:
  • n – int Number of vertices.

  • radius – float Distance threshold for connecting vertices.

  • dim – int Dimension of the space.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_delaunay_triangulation(n=100, seed=0)[source]

Generate a Delaunay triangulation graph.

Vertices are randomly placed in a 2D unit square, and edges are created based on the Delaunay triangulation of these points. The resulting graph has planar structure with triangular faces.

Parameters:
  • n – int Number of vertices.

  • seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:
  • width – int Width of the grid.

  • height – int Height of the grid.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Tree Structures

graphem_rapids.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:
  • r – int Branching factor of the tree.

  • h – int Height of the tree.

Returns:

scipy.sparse.csr_matrix

Sparse adjacency matrix (n × n).

Return type:

adjacency

Influence Maximization

graphem_rapids.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:
  • embedder – GraphEmbedder The initialized graph embedder object

  • k – int Number of seed nodes to select

  • num_iterations – int Number of layout iterations to run

Returns:

list

List of k vertices selected as seeds

Return type:

seeds

graphem_rapids.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:
  • G – networkx.Graph The graph to run influence propagation on

  • seeds – list The list of seed nodes

  • p – float Propagation probability

  • iterations_count – int Maximum number of simulation iterations

Returns:

float

The estimated influence (average number of influenced nodes)

iterations: int

The number of iterations run

Return type:

influence

graphem_rapids.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:

the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.

Return type:

seeds

Benchmarking

graphem_rapids.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:
  • graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)

  • graph_params (dict) – Parameters for the graph generator

  • dim (int, default=2) – Number of embedding components (dimensions)

  • L_min (float) – Force-directed layout parameters

  • k_attr (float) – Force-directed layout parameters

  • k_inter (float) – Force-directed layout parameters

  • n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection

  • sample_size (int, default=512) – Sample size for kNN computation

  • num_iterations (int, default=40) – Number of layout iterations

  • backend (str, default='pytorch') – Backend to use

  • **kwargs – Additional embedder parameters

Returns:

Benchmark results with correlation coefficients for centrality measures

Return type:

dict

graphem_rapids.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]

Run a benchmark on the given graph using GraphEm Rapids.

Parameters:
  • graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)

  • graph_params (dict) – Parameters for the graph generator

  • dim (int, default=3) – Number of embedding components (dimensions)

  • L_min (float, default=10.0) – Minimum spring length parameter

  • k_attr (float, default=0.5) – Attraction force constant

  • k_inter (float, default=0.1) – Intersection repulsion force constant

  • n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection

  • sample_size (int, default=512) – Sample size for kNN computation

  • num_iterations (int, default=40) – Number of layout iterations

  • backend (str, default='pytorch') – Backend to use (‘pytorch’, ‘cuvs’, or ‘auto’)

  • **kwargs – Additional parameters passed to the embedder

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem_rapids.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None, backend='pytorch')[source]

Run a benchmark comparing influence maximization methods.

Parameters:
  • graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)

  • graph_params (dict) – Parameters for the graph generator

  • k (int, default=10) – Number of seed nodes to select for influence maximization

  • p (float, default=0.1) – Propagation probability for influence diffusion

  • iterations (int, default=200) – Number of Monte Carlo iterations for influence estimation

  • dim (int, default=3) – Number of embedding components (dimensions)

  • num_layout_iterations (int, default=20) – Number of force-directed layout iterations

  • layout_params (dict, optional) – Additional parameters for embedder initialization (L_min, k_attr, k_inter, etc.)

  • backend (str, default='pytorch') – Backend to use for embedding (‘pytorch’, ‘cuvs’, or ‘auto’)

Returns:

Benchmark results comparing GraphEm vs Greedy seed selection with influence metrics

Return type:

dict

Visualization

graphem_rapids.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:
  • name – str Name of the centrality measure

  • radii – array-like Radial distances from origin

  • centrality – array-like Centrality values

  • alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem_rapids.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • deg – array-like Various centrality measures

  • btw – array-like Various centrality measures

  • eig – array-like Various centrality measures

  • pr – array-like Various centrality measures

  • clo – array-like Various centrality measures

  • edge_btw – array-like Various centrality measures

  • alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem_rapids.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • centralities – list of array-like List of centrality measures

  • names – list of str Names of the centrality measures

graphem_rapids.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:

benchmark_results – list of dict List of benchmark result dictionaries

Datasets

graphem_rapids.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

Utilities

Backend Selection

class graphem_rapids.utils.backend_selection.BackendConfig(n_vertices, n_components=2, force_backend=None, prefer_gpu=True, memory_limit=None, verbose=True)[source]

Configuration for backend selection.

Parameters:
  • n_vertices (int)

  • n_components (int)

  • force_backend (str)

  • prefer_gpu (bool)

  • memory_limit (float)

  • verbose (bool)

graphem_rapids.utils.backend_selection.get_optimal_backend(config)[source]

Select optimal backend based on configuration and hardware.

Parameters:

config (BackendConfig) – Backend configuration.

Returns:

Optimal backend name (‘pytorch’, ‘cuvs’, ‘cpu’).

Return type:

str

Memory Management

class graphem_rapids.utils.memory_management.MemoryManager(cleanup_on_exit=True)[source]

Context manager for memory management.

__init__(cleanup_on_exit=True)[source]

Initialize memory manager.

Parameters:

cleanup_on_exit (bool, default=True) – Whether to clean up memory on exit.

get_memory_info()[source]

Get current memory information.

cleanup()[source]

Manually trigger memory cleanup.

graphem_rapids.utils.memory_management.get_gpu_memory_info()[source]

Get GPU memory information.

Returns:

GPU memory info with keys ‘total’, ‘allocated’, ‘cached’, ‘free’ in GB.

Return type:

dict

graphem_rapids.utils.memory_management.get_optimal_chunk_size(n_vertices, n_components, available_memory_gb=None, safety_factor=0.7, backend='torch')[source]

Calculate optimal chunk size for memory-efficient processing.

Parameters:
  • n_vertices (int) – Number of vertices in the graph.

  • n_components (int) – Embedding n_components.

  • available_memory_gb (float, optional) – Available GPU memory in GB. If None, automatically detected.

  • safety_factor (float, default=0.7) – Safety factor to avoid OOM (0-1).

  • backend (str, default='torch') – Backend type (‘torch’, ‘pykeops’, ‘cuvs’).

Returns:

Optimal chunk size.

Return type:

int