API Reference

Core Functions

graphem_rapids.create_graphem(adjacency, n_components=2, backend=None, **kwargs)[source]

Create a GraphEmbedder with automatic backend selection.

Parameters:

adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise.
n_components (int, default=2) – Number of components (dimensions) in the embedding.
backend (str, optional) – Force specific backend (‘pytorch’, ‘cuvs’, ‘auto’). If None, automatically selects optimal backend.
**kwargs – Additional arguments passed to the embedder constructor.

Returns:

embedder – Graph embedder instance with optimal backend.

Return type:

GraphEmbedder

Examples

>>> import graphem_rapids as gr
>>> # Generate sparse adjacency matrix
>>> adjacency = gr.generate_er(n=500, p=0.01)
>>> embedder = gr.create_graphem(adjacency, n_components=3)
>>> embedder.run_layout(num_iterations=50)
>>> embedder.display_layout()

graphem_rapids.get_backend_info()[source]

Get information about available backends.

Returns:: Dictionary with backend availability and hardware info.
Return type:: dict

Embedders

PyTorch Backend

class graphem_rapids.GraphEmbedderPyTorch(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]

Bases: object

PyTorch-based graph embedder with CUDA acceleration.

This class provides graph embedding using Laplacian initialization followed by force-directed layout optimization, implemented with PyTorch for GPU acceleration and memory efficiency.

adjacency

Sparse adjacency matrix (n_vertices × n_vertices).

Type:: scipy.sparse.csr_matrix

edges

Edge list extracted from adjacency matrix as (n_edges, 2) tensor.

Type:: torch.Tensor

n

Number of vertices in the graph.

Type:: int

n_components

Number of components (dimensions) in the embedding space.

Type:: int

device

Computing device (CPU or CUDA).

Type:: torch.device

positions

Current vertex positions as (n_vertices, n_components) tensor.

Type:: torch.Tensor

__init__(adjacency, n_components=2, device=None, dtype=torch.float32, L_min=1.0, k_attr=0.2, k_inter=0.5, n_neighbors=10, sample_size=256, batch_size=None, memory_efficient=True, verbose=True, logger_instance=None, seed=None)[source]

Initialize the PyTorch GraphEmbedder.

Parameters:

adjacency (array-like or scipy.sparse matrix) – Adjacency matrix (n_vertices × n_vertices). Can be sparse or dense. For unweighted graphs, should contain 1s for edges, 0s otherwise. For weighted graphs, contains edge weights (future support).
n_components (int, default=2) – Number of components (dimensions) in the embedding.
device (str or torch.device, optional) – Computing device. If None, automatically selects GPU if available.
dtype (torch.dtype, default=torch.float32) – Data type for computations.
L_min (float, default=1.0) – Minimum spring length.
k_attr (float, default=0.2) – Attraction force constant.
k_inter (float, default=0.5) – Intersection repulsion force constant.
n_neighbors (int, default=10) – Number of nearest neighbors for intersection detection.
sample_size (int, default=256) – Sample size for kNN computation.
batch_size (int, optional) – Batch size for processing. If None, automatically selects based on available memory. Can be manually set (e.g., batch_size=1024) for custom memory management.
memory_efficient (bool, default=True) – Use memory-efficient algorithms for large graphs.
verbose (bool, default=True) – Enable verbose logging.
logger_instance (logging.Logger, optional) – Custom logger instance.
seed (int, optional) – Random seed for reproducibility. If provided, sets both numpy and torch seeds.

property positions: Get positions as numpy array for API consistency.

update_positions()[source]: Update vertex positions based on computed forces.

run_layout(num_iterations=100)[source]

Run the force-directed layout algorithm.

Parameters:: num_iterations (int, default=100) – Number of iterations to run.
Returns:: Final vertex positions.
Return type:: torch.Tensor

get_positions()[source]

Get vertex positions as numpy array.

Returns:: Vertex positions.
Return type:: np.ndarray

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:

edge_width (float, default=1) – Width of the edges.
node_size (float, default=3) – Size of the nodes.
node_colors (array-like, optional) – Colors for each vertex.

RAPIDS cuVS Backend

graphem_rapids.GraphEmbedderCuVS: alias of None

Graph Generators

Random Graphs

graphem_rapids.generate_er(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:

n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:

n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Scale-Free and Small-World

graphem_rapids.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:

n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:

n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:

n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:

n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Community Structures

graphem_rapids.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:

n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).
labels: np.ndarray of shape (n,) (only if labels=True): Block labels for each vertex.

Return type:

adjacency

graphem_rapids.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:

l – int Number of cliques.
k – int Size of each clique.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:

l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Bipartite Graphs

graphem_rapids.generate_bipartite_graph(n_top=50, n_bottom=100, p=0.1, seed=0)[source]

Generate a random bipartite graph.

Parameters:

n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.
p – float Probability of edge between any vertex in top set and any vertex in bottom set.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_complete_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a complete bipartite graph.

In a complete bipartite graph, every vertex in the top set is connected to every vertex in the bottom set, resulting in n_top * n_bottom edges.

Parameters:

n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Geometric Graphs

graphem_rapids.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:

n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_delaunay_triangulation(n=100, seed=0)[source]

Generate a Delaunay triangulation graph.

Vertices are randomly placed in a 2D unit square, and edges are created based on the Delaunay triangulation of these points. The resulting graph has planar structure with triangular faces.

Parameters:

n – int Number of vertices.
seed – int Random seed for reproducibility.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

graphem_rapids.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:

width – int Width of the grid.
height – int Height of the grid.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Tree Structures

graphem_rapids.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:

r – int Branching factor of the tree.
h – int Height of the tree.

Returns:

scipy.sparse.csr_matrix: Sparse adjacency matrix (n × n).

Return type:

adjacency

Influence Maximization

graphem_rapids.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:

embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run

Returns:

list: List of k vertices selected as seeds

Return type:

seeds

graphem_rapids.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:

G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations

Returns:

float: The estimated influence (average number of influenced nodes)
iterations: int: The number of iterations run

Return type:

influence

graphem_rapids.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:: the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
Return type:: seeds

Benchmarking

graphem_rapids.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:

graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
dim (int, default=2) – Number of embedding components (dimensions)
L_min (float) – Force-directed layout parameters
k_attr (float) – Force-directed layout parameters
k_inter (float) – Force-directed layout parameters
n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection
sample_size (int, default=512) – Sample size for kNN computation
num_iterations (int, default=40) – Number of layout iterations
backend (str, default='pytorch') – Backend to use
**kwargs – Additional embedder parameters

Returns:

Benchmark results with correlation coefficients for centrality measures

Return type:

dict

graphem_rapids.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, n_neighbors=15, sample_size=512, num_iterations=40, backend='pytorch', **kwargs)[source]

Run a benchmark on the given graph using GraphEm Rapids.

Parameters:

graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
dim (int, default=3) – Number of embedding components (dimensions)
L_min (float, default=10.0) – Minimum spring length parameter
k_attr (float, default=0.5) – Attraction force constant
k_inter (float, default=0.1) – Intersection repulsion force constant
n_neighbors (int, default=15) – Number of nearest neighbors for intersection detection
sample_size (int, default=512) – Sample size for kNN computation
num_iterations (int, default=40) – Number of layout iterations
backend (str, default='pytorch') – Backend to use (‘pytorch’, ‘cuvs’, or ‘auto’)
**kwargs – Additional parameters passed to the embedder

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem_rapids.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None, backend='pytorch')[source]

Run a benchmark comparing influence maximization methods.

Parameters:

graph_generator (callable) – Function to generate a graph (returns sparse adjacency matrix)
graph_params (dict) – Parameters for the graph generator
k (int, default=10) – Number of seed nodes to select for influence maximization
p (float, default=0.1) – Propagation probability for influence diffusion
iterations (int, default=200) – Number of Monte Carlo iterations for influence estimation
dim (int, default=3) – Number of embedding components (dimensions)
num_layout_iterations (int, default=20) – Number of force-directed layout iterations
layout_params (dict, optional) – Additional parameters for embedder initialization (L_min, k_attr, k_inter, etc.)
backend (str, default='pytorch') – Backend to use for embedding (‘pytorch’, ‘cuvs’, or ‘auto’)

Returns:

Benchmark results comparing GraphEm vs Greedy seed selection with influence metrics

Return type:

dict

Visualization

graphem_rapids.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:

name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem_rapids.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:

radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem_rapids.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:

radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures

graphem_rapids.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:: benchmark_results – list of dict List of benchmark result dictionaries

Datasets

graphem_rapids.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

Utilities

Backend Selection

class graphem_rapids.utils.backend_selection.BackendConfig(n_vertices, n_components=2, force_backend=None, prefer_gpu=True, memory_limit=None, verbose=True)[source]

Configuration for backend selection.

Parameters:

n_vertices (int)
n_components (int)
force_backend (str)
prefer_gpu (bool)
memory_limit (float)
verbose (bool)

graphem_rapids.utils.backend_selection.get_optimal_backend(config)[source]

Select optimal backend based on configuration and hardware.

Parameters:: config (BackendConfig) – Backend configuration.
Returns:: Optimal backend name (‘pytorch’, ‘cuvs’, ‘cpu’).
Return type:: str

Memory Management

class graphem_rapids.utils.memory_management.MemoryManager(cleanup_on_exit=True)[source]

Context manager for memory management.

__init__(cleanup_on_exit=True)[source]

Initialize memory manager.

Parameters:: cleanup_on_exit (bool, default=True) – Whether to clean up memory on exit.

get_memory_info()[source]: Get current memory information.

cleanup()[source]: Manually trigger memory cleanup.

graphem_rapids.utils.memory_management.get_gpu_memory_info()[source]

Get GPU memory information.

Returns:: GPU memory info with keys ‘total’, ‘allocated’, ‘cached’, ‘free’ in GB.
Return type:: dict

graphem_rapids.utils.memory_management.get_optimal_chunk_size(n_vertices, n_components, available_memory_gb=None, safety_factor=0.7, backend='torch')[source]

Calculate optimal chunk size for memory-efficient processing.

Parameters:

n_vertices (int) – Number of vertices in the graph.
n_components (int) – Embedding n_components.
available_memory_gb (float, optional) – Available GPU memory in GB. If None, automatically detected.
safety_factor (float, default=0.7) – Safety factor to avoid OOM (0-1).
backend (str, default='torch') – Backend type (‘torch’, ‘pykeops’, ‘cuvs’).

Returns:

Optimal chunk size.

Return type:

int