API Reference

This section contains detailed documentation for all GraphEm modules and functions.

Core Classes

GraphEmbedder

graphem.embedder.GraphEmbedder - JAX-based graph embedding using Laplacian embedding with spring forces and intersection avoidance.

Main class for creating graph embeddings. Combines Laplacian eigenvectors with force-directed layout optimization to create visually appealing and structurally meaningful embeddings.

Key methods: - __init__(edges, n_vertices, dimension=2, ...) - Initialize embedder with graph structure - run_layout(num_iterations=100) - Execute layout algorithm for specified iterations - display_layout(edge_width=1, node_size=3, ...) - Visualize embedding using Plotly

class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Bases: object

A class for embedding graphs using the Laplacian embedding.

edges

np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.

n

int Number of vertices in the graph.

dimension

int Dimension of the embedding.

L_min

float Minimum length of the spring.

k_attr

float Attraction force constant.

k_inter

float Repulsion force constant for intersections.

knn_k

int Number of nearest neighbors to consider.

sample_size

int Number of samples for kNN search.

batch_size

int Batch size for kNN search.

my_logger

loguru.logger Logger object to use for logging.

__init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Initialize the GraphEmbedder.

logger

System logger

locate_knn_midpoints(midpoints, k)[source]

Locate k nearest neighbors for each midpoint.

static compute_spring_forces(positions, edges, L_min, k_attr)[source]

Compute the spring forces between vertices.

static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]

Compute the intersection forces between vertices and their nearest neighbors.

update_positions()[source]

Update the positions of the vertices based on the spring forces and intersection forces.

run_layout(num_iterations=100)[source]

Run the layout for a given number of iterations.

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:
  • edge_width (float) – The width of the edges in the graph embedding.

  • node_size (float) – The size of the nodes in the graph embedding.

  • node_colors (array-like of shape (num_vertices,)) – An array of colors for each vertex.

Returns:

Displays the graph embedding using Plotly in the appropriate dimension.

Return type:

None

HPIndex

graphem.index.HPIndex - High-performance k-nearest neighbors search with JAX acceleration.

Efficient batched k-NN implementation for large datasets with memory optimization through tiling.

Key methods: - knn_tiled(x, y, k=5, ...) - Find k nearest neighbors with batched processing

class graphem.index.HPIndex[source]

Bases: object

A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.

__init__()[source]
static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]

Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.

Parameters:
  • x – (n, d) array of database points

  • y – (m, d) array of query points

  • k – number of nearest neighbors

  • x_tile_size – size of database tiles

  • y_batch_size – size of query batches

Returns:

(m, k) array of indices of nearest neighbors

Graph Generators

graphem.generators - Generate various graph types for testing and experimentation.

Provides NetworkX-based generators for standard graph models including random graphs, scale-free networks, small-world graphs, and more.

Key functions: - erdos_renyi_graph(n, p) - Random graph with edge probability p - generate_sbm(n_per_block, num_blocks, p_in, p_out) - Stochastic block model - generate_ba(n, m) - Barabási-Albert preferential attachment - generate_ws(n, k, p) - Watts-Strogatz small-world - generate_scale_free(n, ...) - Scale-free network - generate_geometric(n, radius) - Random geometric graph

Graph generators for Graphem.

This module provides various functions to generate different types of graphs.

graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:
  • n – int Number of vertices.

  • p – float Probability that an edge exists between any pair of vertices.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.compute_vertex_degrees(n, edges)[source]

Compute the degree of each vertex from the edge list.

Parameters:
  • n – number of vertices

  • edges – array of shape (num_edges, 2)

Returns:

np.array of shape (n,) with degree of each vertex

Return type:

degrees

graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:
  • n_per_block – int Number of vertices per block.

  • num_blocks – int Number of blocks.

  • p_in – float Probability of edge within a block.

  • p_out – float Probability of edge between blocks.

  • labels – bool If True, return vertex labels.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

labels: np.ndarray of shape (n,) (only if labels=True)

Block labels for each vertex.

Return type:

edges

graphem.generators.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of edges to attach from a new vertex to existing vertices.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:
  • n – int Number of vertices.

  • k – int Each vertex is connected to k nearest neighbors in ring topology.

  • p – float Probability of rewiring each edge.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of random edges to add per new vertex.

  • p – float Probability of adding a triangle after adding a random edge.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:
  • width – int Width of the grid.

  • height – int Height of the grid.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a random bipartite graph.

Parameters:
  • n_top – int Number of vertices in the top set.

  • n_bottom – int Number of vertices in the bottom set.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:
  • r – int Branching factor of the tree.

  • h – int Height of the tree.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:
  • n – int Number of vertices.

  • d – int Degree of each vertex.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:
  • n – int Number of vertices.

  • alpha – float Parameters for the scale-free graph generation.

  • beta – float Parameters for the scale-free graph generation.

  • gamma – float Parameters for the scale-free graph generation.

  • delta_in – float Parameters for the scale-free graph generation.

  • delta_out – float Parameters for the scale-free graph generation.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:
  • n – int Number of vertices.

  • radius – float Distance threshold for connecting vertices.

  • dim – int Dimension of the space.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

  • p – float Rewiring probability.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

Influence Maximization

graphem.influence - Seed selection algorithms for influence maximization in networks.

Implements GraphEm-based seed selection using radial distances from embedding origin, plus traditional greedy methods with NDlib simulation.

Key functions: - graphem_seed_selection(embedder, k) - Select seeds based on radial distances - greedy_seed_selection(G, k, p) - Traditional greedy algorithm - ndlib_estimated_influence(G, seeds, p) - Evaluate influence using Independent Cascades

Influence maximization functionality for Graphem.

graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:
  • embedder – GraphEmbedder The initialized graph embedder object

  • k – int Number of seed nodes to select

  • num_iterations – int Number of layout iterations to run

Returns:

list

List of k vertices selected as seeds

Return type:

seeds

graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:
  • G – networkx.Graph The graph to run influence propagation on

  • seeds – list The list of seed nodes

  • p – float Propagation probability

  • iterations_count – int Maximum number of simulation iterations

Returns:

float

The estimated influence (average number of influenced nodes)

iterations: int

The number of iterations run

Return type:

influence

graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:

the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.

Return type:

seeds

Dataset Utilities

graphem.datasets - Load real-world network datasets from various sources.

Download and process datasets from SNAP, Network Repository, and Semantic Scholar. Handles automatic downloading, extraction, and format conversion.

Key classes: - SNAPDataset(dataset_name) - SNAP collection (Facebook, Wikipedia, arXiv, etc.) - NetworkRepositoryDataset(dataset_name) - Network Repository collection - SemanticScholarDataset(dataset_name) - Citation networks

Key functions: - load_dataset(dataset_name) - Load any dataset by name - load_dataset_as_networkx(dataset_name) - Load as NetworkX graph - list_available_datasets() - Show all available datasets

Real-world dataset loader for Graphem.

This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.

graphem.datasets.get_data_directory()[source]

Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.

Returns:

Path to the data directory

Return type:

Path

graphem.datasets.download_file(url, filepath, description=None)[source]

Download a file with progress bar.

Parameters:
  • url – str URL to download

  • filepath – Path or str Path to save the downloaded file

  • description – str, optional Description for the progress bar

graphem.datasets.extract_file(filepath, extract_dir=None)[source]

Extract a compressed file.

Parameters:
  • filepath – Path or str Path to the compressed file

  • extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.

Returns:

Path to the extraction directory

Return type:

Path

class graphem.datasets.DatasetLoader(name)[source]

Bases: object

Base class for dataset loaders.

__init__(name)[source]

Initialize the dataset loader.

Parameters:

name – str Name of the dataset

download()[source]

Download the dataset. To be implemented by subclasses.

load()[source]

Load the dataset as edges. To be implemented by subclasses.

load_as_networkx()[source]

Load the dataset as a NetworkX graph.

Returns:

The loaded graph

Return type:

networkx.Graph

info()[source]

Print information about the dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

Returns:

True if downloaded, False otherwise

Return type:

bool

class graphem.datasets.SNAPDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Stanford Network Analysis Project (SNAP).

SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/

AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}
__init__(dataset_name)[source]

Initialize the SNAP dataset loader.

Parameters:

dataset_name – str Name of the SNAP dataset to load

download()[source]

Download the SNAP dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

load()[source]

Load the SNAP dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Network Repository.

Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/

AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}
__init__(dataset_name)[source]

Initialize the Network Repository dataset loader.

Parameters:

dataset_name – str Name of the Network Repository dataset to load

download()[source]

Download the Network Repository dataset.

is_downloaded()[source]

Check if the expected dataset file exists without scanning the whole tree.

Returns:

True if the expected file exists, False otherwise

Return type:

bool

load()[source]

Load the Network Repository dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]

Bases: DatasetLoader

Loader for Semantic Scholar citation network datasets.

Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.

AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}
__init__(dataset_name='s2-CS')[source]

Initialize the Semantic Scholar dataset loader.

Parameters:

dataset_name – str Name of the Semantic Scholar dataset to load

download()[source]

Download the Semantic Scholar dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

load()[source]

Load the Semantic Scholar dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.list_available_datasets()[source]

List all available datasets from all sources.

Returns:

Dictionary with dataset information

Return type:

dict

graphem.datasets.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.load_dataset_as_networkx(dataset_name)[source]

Load a dataset as a NetworkX graph.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

The loaded graph

Return type:

networkx.Graph

Visualization

graphem.visualization - Statistical analysis and plotting utilities for embeddings.

Calculate correlations between embedding radial distances and network centrality measures, with bootstrap confidence intervals.

Key functions: - report_corr(name, radii, centrality) - Correlation with confidence intervals - report_full_correlation_matrix(radii, ...) - Multiple centrality correlations - plot_radial_vs_centrality(radii, centralities, names) - Scatter plots with trendlines - display_benchmark_results(results) - Format benchmark output

Visualization utilities for Graphem.

graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:
  • name – str Name of the centrality measure

  • radii – array-like Radial distances from origin

  • centrality – array-like Centrality values

  • alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • deg – array-like Various centrality measures

  • btw – array-like Various centrality measures

  • eig – array-like Various centrality measures

  • pr – array-like Various centrality measures

  • clo – array-like Various centrality measures

  • edge_btw – array-like Various centrality measures

  • alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • centralities – list of array-like List of centrality measures

  • names – list of str Names of the centrality measures

graphem.visualization.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:

benchmark_results – list of dict List of benchmark result dictionaries

Benchmarking

graphem.benchmark - Performance evaluation and comparative analysis tools.

Comprehensive benchmarking for embedding quality, centrality correlations, and influence maximization effectiveness.

Key functions: - run_benchmark(graph_generator, params) - Basic embedding benchmark - benchmark_correlations(graph_generator, params) - Centrality correlation analysis - run_influence_benchmark(graph_generator, params) - Compare influence methods

Benchmark functionality for Graphem.

graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark on the given graph.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • dim – int Embedding dimension

  • L_min – float GraphEmbedder parameters

  • k_attr – float GraphEmbedder parameters

  • k_inter – float GraphEmbedder parameters

  • knn_k – float GraphEmbedder parameters

  • sample_size – int Batch parameters for kNN search

  • batch_size – int Batch parameters for kNN search

  • num_iterations – int Number of layout iterations

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • parameters (Other) – same as run_benchmark

Returns:

Benchmark results with correlations

Return type:

dict

graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]

Run a benchmark comparing influence maximization methods.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • k – int Number of seed nodes to select

  • p – float Propagation probability

  • iterations – int Number of iterations for influence simulation

  • dim – int Embedding dimension

  • num_layout_iterations – int Number of iterations for layout algorithm

  • layout_params – dict Parameters for the layout algorithm

Returns:

Benchmark results comparing influence maximization methods

Return type:

dict

Complete Module Reference

graphem.embedder module

A JAX-based implementation of graph embedding.

class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Bases: object

A class for embedding graphs using the Laplacian embedding.

edges

np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.

n

int Number of vertices in the graph.

dimension

int Dimension of the embedding.

L_min

float Minimum length of the spring.

k_attr

float Attraction force constant.

k_inter

float Repulsion force constant for intersections.

knn_k

int Number of nearest neighbors to consider.

sample_size

int Number of samples for kNN search.

batch_size

int Batch size for kNN search.

my_logger

loguru.logger Logger object to use for logging.

__init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Initialize the GraphEmbedder.

logger

System logger

locate_knn_midpoints(midpoints, k)[source]

Locate k nearest neighbors for each midpoint.

static compute_spring_forces(positions, edges, L_min, k_attr)[source]

Compute the spring forces between vertices.

static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]

Compute the intersection forces between vertices and their nearest neighbors.

update_positions()[source]

Update the positions of the vertices based on the spring forces and intersection forces.

run_layout(num_iterations=100)[source]

Run the layout for a given number of iterations.

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:
  • edge_width (float) – The width of the edges in the graph embedding.

  • node_size (float) – The size of the nodes in the graph embedding.

  • node_colors (array-like of shape (num_vertices,)) – An array of colors for each vertex.

Returns:

Displays the graph embedding using Plotly in the appropriate dimension.

Return type:

None

graphem.generators module

Graph generators for Graphem.

This module provides various functions to generate different types of graphs.

graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:
  • n – int Number of vertices.

  • p – float Probability that an edge exists between any pair of vertices.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.compute_vertex_degrees(n, edges)[source]

Compute the degree of each vertex from the edge list.

Parameters:
  • n – number of vertices

  • edges – array of shape (num_edges, 2)

Returns:

np.array of shape (n,) with degree of each vertex

Return type:

degrees

graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:
  • n_per_block – int Number of vertices per block.

  • num_blocks – int Number of blocks.

  • p_in – float Probability of edge within a block.

  • p_out – float Probability of edge between blocks.

  • labels – bool If True, return vertex labels.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

labels: np.ndarray of shape (n,) (only if labels=True)

Block labels for each vertex.

Return type:

edges

graphem.generators.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of edges to attach from a new vertex to existing vertices.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:
  • n – int Number of vertices.

  • k – int Each vertex is connected to k nearest neighbors in ring topology.

  • p – float Probability of rewiring each edge.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:
  • n – int Number of vertices.

  • m – int Number of random edges to add per new vertex.

  • p – float Probability of adding a triangle after adding a random edge.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:
  • width – int Width of the grid.

  • height – int Height of the grid.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a random bipartite graph.

Parameters:
  • n_top – int Number of vertices in the top set.

  • n_bottom – int Number of vertices in the bottom set.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:
  • r – int Branching factor of the tree.

  • h – int Height of the tree.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:
  • n – int Number of vertices.

  • d – int Degree of each vertex.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:
  • n – int Number of vertices.

  • alpha – float Parameters for the scale-free graph generation.

  • beta – float Parameters for the scale-free graph generation.

  • gamma – float Parameters for the scale-free graph generation.

  • delta_in – float Parameters for the scale-free graph generation.

  • delta_out – float Parameters for the scale-free graph generation.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:
  • n – int Number of vertices.

  • radius – float Distance threshold for connecting vertices.

  • dim – int Dimension of the space.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:
  • l – int Number of cliques.

  • k – int Size of each clique.

  • p – float Rewiring probability.

  • seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2)

Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.influence module

Influence maximization functionality for Graphem.

graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:
  • embedder – GraphEmbedder The initialized graph embedder object

  • k – int Number of seed nodes to select

  • num_iterations – int Number of layout iterations to run

Returns:

list

List of k vertices selected as seeds

Return type:

seeds

graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:
  • G – networkx.Graph The graph to run influence propagation on

  • seeds – list The list of seed nodes

  • p – float Propagation probability

  • iterations_count – int Maximum number of simulation iterations

Returns:

float

The estimated influence (average number of influenced nodes)

iterations: int

The number of iterations run

Return type:

influence

graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:

the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.

Return type:

seeds

graphem.index module

A JAX-based implementation of efficient k-nearest neighbors.

class graphem.index.HPIndex[source]

Bases: object

A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.

__init__()[source]
static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]

Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.

Parameters:
  • x – (n, d) array of database points

  • y – (m, d) array of query points

  • k – number of nearest neighbors

  • x_tile_size – size of database tiles

  • y_batch_size – size of query batches

Returns:

(m, k) array of indices of nearest neighbors

graphem.datasets module

Real-world dataset loader for Graphem.

This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.

graphem.datasets.get_data_directory()[source]

Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.

Returns:

Path to the data directory

Return type:

Path

graphem.datasets.download_file(url, filepath, description=None)[source]

Download a file with progress bar.

Parameters:
  • url – str URL to download

  • filepath – Path or str Path to save the downloaded file

  • description – str, optional Description for the progress bar

graphem.datasets.extract_file(filepath, extract_dir=None)[source]

Extract a compressed file.

Parameters:
  • filepath – Path or str Path to the compressed file

  • extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.

Returns:

Path to the extraction directory

Return type:

Path

class graphem.datasets.DatasetLoader(name)[source]

Bases: object

Base class for dataset loaders.

__init__(name)[source]

Initialize the dataset loader.

Parameters:

name – str Name of the dataset

download()[source]

Download the dataset. To be implemented by subclasses.

load()[source]

Load the dataset as edges. To be implemented by subclasses.

load_as_networkx()[source]

Load the dataset as a NetworkX graph.

Returns:

The loaded graph

Return type:

networkx.Graph

info()[source]

Print information about the dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

Returns:

True if downloaded, False otherwise

Return type:

bool

class graphem.datasets.SNAPDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Stanford Network Analysis Project (SNAP).

SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/

AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}
__init__(dataset_name)[source]

Initialize the SNAP dataset loader.

Parameters:

dataset_name – str Name of the SNAP dataset to load

download()[source]

Download the SNAP dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

load()[source]

Load the SNAP dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Network Repository.

Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/

AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}
__init__(dataset_name)[source]

Initialize the Network Repository dataset loader.

Parameters:

dataset_name – str Name of the Network Repository dataset to load

download()[source]

Download the Network Repository dataset.

is_downloaded()[source]

Check if the expected dataset file exists without scanning the whole tree.

Returns:

True if the expected file exists, False otherwise

Return type:

bool

load()[source]

Load the Network Repository dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]

Bases: DatasetLoader

Loader for Semantic Scholar citation network datasets.

Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.

AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}
__init__(dataset_name='s2-CS')[source]

Initialize the Semantic Scholar dataset loader.

Parameters:

dataset_name – str Name of the Semantic Scholar dataset to load

download()[source]

Download the Semantic Scholar dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

load()[source]

Load the Semantic Scholar dataset as edges.

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.list_available_datasets()[source]

List all available datasets from all sources.

Returns:

Dictionary with dataset information

Return type:

dict

graphem.datasets.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges)

vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.load_dataset_as_networkx(dataset_name)[source]

Load a dataset as a NetworkX graph.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

The loaded graph

Return type:

networkx.Graph

graphem.visualization module

Visualization utilities for Graphem.

graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:
  • name – str Name of the centrality measure

  • radii – array-like Radial distances from origin

  • centrality – array-like Centrality values

  • alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • deg – array-like Various centrality measures

  • btw – array-like Various centrality measures

  • eig – array-like Various centrality measures

  • pr – array-like Various centrality measures

  • clo – array-like Various centrality measures

  • edge_btw – array-like Various centrality measures

  • alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:
  • radii – array-like Radial distances from origin

  • centralities – list of array-like List of centrality measures

  • names – list of str Names of the centrality measures

graphem.visualization.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:

benchmark_results – list of dict List of benchmark result dictionaries

graphem.benchmark module

Benchmark functionality for Graphem.

graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark on the given graph.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • dim – int Embedding dimension

  • L_min – float GraphEmbedder parameters

  • k_attr – float GraphEmbedder parameters

  • k_inter – float GraphEmbedder parameters

  • knn_k – float GraphEmbedder parameters

  • sample_size – int Batch parameters for kNN search

  • batch_size – int Batch parameters for kNN search

  • num_iterations – int Number of layout iterations

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • parameters (Other) – same as run_benchmark

Returns:

Benchmark results with correlations

Return type:

dict

graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]

Run a benchmark comparing influence maximization methods.

Parameters:
  • graph_generator – callable Function to generate a graph

  • graph_params – dict Parameters for the graph generator

  • k – int Number of seed nodes to select

  • p – float Propagation probability

  • iterations – int Number of iterations for influence simulation

  • dim – int Embedding dimension

  • num_layout_iterations – int Number of iterations for layout algorithm

  • layout_params – dict Parameters for the layout algorithm

Returns:

Benchmark results comparing influence maximization methods

Return type:

dict