API Reference

This section contains detailed documentation for all GraphEm modules and functions.

Core Classes

GraphEmbedder

graphem.embedder.GraphEmbedder - JAX-based graph embedding using Laplacian embedding with spring forces and intersection avoidance.

Main class for creating graph embeddings. Combines Laplacian eigenvectors with force-directed layout optimization to create visually appealing and structurally meaningful embeddings.

Key methods: - __init__(edges, n_vertices, dimension=2, ...) - Initialize embedder with graph structure - run_layout(num_iterations=100) - Execute layout algorithm for specified iterations - display_layout(edge_width=1, node_size=3, ...) - Visualize embedding using Plotly

class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Bases: object

A class for embedding graphs using the Laplacian embedding.

edges: np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.

n: int Number of vertices in the graph.

dimension: int Dimension of the embedding.

L_min: float Minimum length of the spring.

k_attr: float Attraction force constant.

k_inter: float Repulsion force constant for intersections.

knn_k: int Number of nearest neighbors to consider.

sample_size: int Number of samples for kNN search.

batch_size: int Batch size for kNN search.

my_logger: loguru.logger Logger object to use for logging.

__init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]: Initialize the GraphEmbedder.

logger: System logger

locate_knn_midpoints(midpoints, k)[source]: Locate k nearest neighbors for each midpoint.

static compute_spring_forces(positions, edges, L_min, k_attr)[source]: Compute the spring forces between vertices.

static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]: Compute the intersection forces between vertices and their nearest neighbors.

update_positions()[source]: Update the positions of the vertices based on the spring forces and intersection forces.

run_layout(num_iterations=100)[source]: Run the layout for a given number of iterations.

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:

edge_width (float) – The width of the edges in the graph embedding.
node_size (float) – The size of the nodes in the graph embedding.
node_colors (array-like of shape (num_vertices,)) – An array of colors for each vertex.

Returns:

Displays the graph embedding using Plotly in the appropriate dimension.

Return type:

None

HPIndex

graphem.index.HPIndex - High-performance k-nearest neighbors search with JAX acceleration.

Efficient batched k-NN implementation for large datasets with memory optimization through tiling.

Key methods: - knn_tiled(x, y, k=5, ...) - Find k nearest neighbors with batched processing

class graphem.index.HPIndex[source]

Bases: object

A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.

__init__()[source]

static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]

Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.

Parameters:

x – (n, d) array of database points
y – (m, d) array of query points
k – number of nearest neighbors
x_tile_size – size of database tiles
y_batch_size – size of query batches

Returns:

(m, k) array of indices of nearest neighbors

Graph Generators

graphem.generators - Generate various graph types for testing and experimentation.

Provides NetworkX-based generators for standard graph models including random graphs, scale-free networks, small-world graphs, and more.

Key functions: - erdos_renyi_graph(n, p) - Random graph with edge probability p - generate_sbm(n_per_block, num_blocks, p_in, p_out) - Stochastic block model - generate_ba(n, m) - Barabási-Albert preferential attachment - generate_ws(n, k, p) - Watts-Strogatz small-world - generate_scale_free(n, ...) - Scale-free network - generate_geometric(n, radius) - Random geometric graph

Graph generators for Graphem.

This module provides various functions to generate different types of graphs.

graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:

n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.compute_vertex_degrees(n, edges)[source]

Compute the degree of each vertex from the edge list.

Parameters:

n – number of vertices
edges – array of shape (num_edges, 2)

Returns:

np.array of shape (n,) with degree of each vertex

Return type:

degrees

graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:

n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.
labels: np.ndarray of shape (n,) (only if labels=True): Block labels for each vertex.

Return type:

edges

graphem.generators.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:

n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:

n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:

n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:

width – int Width of the grid.
height – int Height of the grid.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a random bipartite graph.

Parameters:

n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:

r – int Branching factor of the tree.
h – int Height of the tree.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:

n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:

n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:

n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:

l – int Number of cliques.
k – int Size of each clique.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:

l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

Influence Maximization

graphem.influence - Seed selection algorithms for influence maximization in networks.

Implements GraphEm-based seed selection using radial distances from embedding origin, plus traditional greedy methods with NDlib simulation.

Key functions: - graphem_seed_selection(embedder, k) - Select seeds based on radial distances - greedy_seed_selection(G, k, p) - Traditional greedy algorithm - ndlib_estimated_influence(G, seeds, p) - Evaluate influence using Independent Cascades

Influence maximization functionality for Graphem.

graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:

embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run

Returns:

list: List of k vertices selected as seeds

Return type:

seeds

graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:

G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations

Returns:

float: The estimated influence (average number of influenced nodes)
iterations: int: The number of iterations run

Return type:

influence

graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:: the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
Return type:: seeds

Dataset Utilities

graphem.datasets - Load real-world network datasets from various sources.

Download and process datasets from SNAP, Network Repository, and Semantic Scholar. Handles automatic downloading, extraction, and format conversion.

Key classes: - SNAPDataset(dataset_name) - SNAP collection (Facebook, Wikipedia, arXiv, etc.) - NetworkRepositoryDataset(dataset_name) - Network Repository collection - SemanticScholarDataset(dataset_name) - Citation networks

Key functions: - load_dataset(dataset_name) - Load any dataset by name - load_dataset_as_networkx(dataset_name) - Load as NetworkX graph - list_available_datasets() - Show all available datasets

Real-world dataset loader for Graphem.

This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.

graphem.datasets.get_data_directory()[source]

Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.

Returns:: Path to the data directory
Return type:: Path

graphem.datasets.download_file(url, filepath, description=None)[source]

Download a file with progress bar.

Parameters:

url – str URL to download
filepath – Path or str Path to save the downloaded file
description – str, optional Description for the progress bar

graphem.datasets.extract_file(filepath, extract_dir=None)[source]

Extract a compressed file.

Parameters:

filepath – Path or str Path to the compressed file
extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.

Returns:

Path to the extraction directory

Return type:

Path

class graphem.datasets.DatasetLoader(name)[source]

Bases: object

Base class for dataset loaders.

__init__(name)[source]

Initialize the dataset loader.

Parameters:: name – str Name of the dataset

download()[source]: Download the dataset. To be implemented by subclasses.

load()[source]: Load the dataset as edges. To be implemented by subclasses.

load_as_networkx()[source]

Load the dataset as a NetworkX graph.

Returns:: The loaded graph
Return type:: networkx.Graph

info()[source]: Print information about the dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

Returns:: True if downloaded, False otherwise
Return type:: bool

class graphem.datasets.SNAPDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Stanford Network Analysis Project (SNAP).

SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/

AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}

__init__(dataset_name)[source]

Initialize the SNAP dataset loader.

Parameters:: dataset_name – str Name of the SNAP dataset to load

download()[source]: Download the SNAP dataset.

is_downloaded()[source]: Check if the dataset is already downloaded.

load()[source]

Load the SNAP dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Network Repository.

Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/

AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}

__init__(dataset_name)[source]

Initialize the Network Repository dataset loader.

Parameters:: dataset_name – str Name of the Network Repository dataset to load

download()[source]: Download the Network Repository dataset.

is_downloaded()[source]

Check if the expected dataset file exists without scanning the whole tree.

Returns:: True if the expected file exists, False otherwise
Return type:: bool

load()[source]

Load the Network Repository dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]

Bases: DatasetLoader

Loader for Semantic Scholar citation network datasets.

Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.

AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}

__init__(dataset_name='s2-CS')[source]

Initialize the Semantic Scholar dataset loader.

Parameters:: dataset_name – str Name of the Semantic Scholar dataset to load

download()[source]: Download the Semantic Scholar dataset.

is_downloaded()[source]: Check if the dataset is already downloaded.

load()[source]

Load the Semantic Scholar dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.list_available_datasets()[source]

List all available datasets from all sources.

Returns:: Dictionary with dataset information
Return type:: dict

graphem.datasets.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.load_dataset_as_networkx(dataset_name)[source]

Load a dataset as a NetworkX graph.

Parameters:: dataset_name – str Name of the dataset to load
Returns:: The loaded graph
Return type:: networkx.Graph

Visualization

graphem.visualization - Statistical analysis and plotting utilities for embeddings.

Calculate correlations between embedding radial distances and network centrality measures, with bootstrap confidence intervals.

Key functions: - report_corr(name, radii, centrality) - Correlation with confidence intervals - report_full_correlation_matrix(radii, ...) - Multiple centrality correlations - plot_radial_vs_centrality(radii, centralities, names) - Scatter plots with trendlines - display_benchmark_results(results) - Format benchmark output

Visualization utilities for Graphem.

graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:

name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:

radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:

radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures

graphem.visualization.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:: benchmark_results – list of dict List of benchmark result dictionaries

Benchmarking

graphem.benchmark - Performance evaluation and comparative analysis tools.

Comprehensive benchmarking for embedding quality, centrality correlations, and influence maximization effectiveness.

Key functions: - run_benchmark(graph_generator, params) - Basic embedding benchmark - benchmark_correlations(graph_generator, params) - Centrality correlation analysis - run_influence_benchmark(graph_generator, params) - Compare influence methods

Benchmark functionality for Graphem.

graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark on the given graph.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
dim – int Embedding dimension
L_min – float GraphEmbedder parameters
k_attr – float GraphEmbedder parameters
k_inter – float GraphEmbedder parameters
knn_k – float GraphEmbedder parameters
sample_size – int Batch parameters for kNN search
batch_size – int Batch parameters for kNN search
num_iterations – int Number of layout iterations

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
parameters (Other) – same as run_benchmark

Returns:

Benchmark results with correlations

Return type:

dict

graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]

Run a benchmark comparing influence maximization methods.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
k – int Number of seed nodes to select
p – float Propagation probability
iterations – int Number of iterations for influence simulation
dim – int Embedding dimension
num_layout_iterations – int Number of iterations for layout algorithm
layout_params – dict Parameters for the layout algorithm

Returns:

Benchmark results comparing influence maximization methods

Return type:

dict

Complete Module Reference

graphem.embedder module

A JAX-based implementation of graph embedding.

class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]

Bases: object

A class for embedding graphs using the Laplacian embedding.

edges: np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.

n: int Number of vertices in the graph.

dimension: int Dimension of the embedding.

L_min: float Minimum length of the spring.

k_attr: float Attraction force constant.

k_inter: float Repulsion force constant for intersections.

knn_k: int Number of nearest neighbors to consider.

sample_size: int Number of samples for kNN search.

batch_size: int Batch size for kNN search.

my_logger: loguru.logger Logger object to use for logging.

__init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]: Initialize the GraphEmbedder.

logger: System logger

locate_knn_midpoints(midpoints, k)[source]: Locate k nearest neighbors for each midpoint.

static compute_spring_forces(positions, edges, L_min, k_attr)[source]: Compute the spring forces between vertices.

static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]: Compute the intersection forces between vertices and their nearest neighbors.

update_positions()[source]: Update the positions of the vertices based on the spring forces and intersection forces.

run_layout(num_iterations=100)[source]: Run the layout for a given number of iterations.

display_layout(edge_width=1, node_size=3, node_colors=None)[source]

Display the graph embedding using Plotly.

Parameters:

edge_width (float) – The width of the edges in the graph embedding.
node_size (float) – The size of the nodes in the graph embedding.
node_colors (array-like of shape (num_vertices,)) – An array of colors for each vertex.

Returns:

Displays the graph embedding using Plotly in the appropriate dimension.

Return type:

None

graphem.generators module

Graph generators for Graphem.

This module provides various functions to generate different types of graphs.

graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]

Generate a random undirected graph using the Erdős–Rényi G(n, p) model.

Parameters:

n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.compute_vertex_degrees(n, edges)[source]

Compute the degree of each vertex from the edge list.

Parameters:

n – number of vertices
edges – array of shape (num_edges, 2)

Returns:

np.array of shape (n,) with degree of each vertex

Return type:

degrees

graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]

Generate a stochastic block model graph.

Parameters:

n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.
labels: np.ndarray of shape (n,) (only if labels=True): Block labels for each vertex.

Return type:

edges

graphem.generators.generate_ba(n=300, m=3, seed=0)[source]

Generate a Barabási-Albert preferential attachment graph.

Parameters:

n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]

Generate a Watts-Strogatz small-world graph.

Parameters:

n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]

Generate a powerlaw cluster graph.

Parameters:

n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_road_network(width=30, height=30)[source]

Generate a 2D grid graph representing a road network.

Parameters:

width – int Width of the grid.
height – int Height of the grid.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]

Generate a random bipartite graph.

Parameters:

n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_balanced_tree(r=2, h=10)[source]

Generate a balanced r-ary tree of height h.

Parameters:

r – int Branching factor of the tree.
h – int Height of the tree.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]

Generate a random regular graph where each node has degree d.

Parameters:

n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]

Generate a scale-free graph using Holme and Kim algorithm.

Parameters:

n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]

Generate a random geometric graph in a unit cube.

Parameters:

n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_caveman(l=10, k=10)[source]

Generate a caveman graph with l cliques of size k.

Parameters:

l – int Number of cliques.
k – int Size of each clique.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]

Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.

Parameters:

l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.

Returns:

np.ndarray of shape (num_edges, 2): Array of edge pairs (i, j) with i < j.

Return type:

edges

graphem.influence module

Influence maximization functionality for Graphem.

graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]

Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.

Parameters:

embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run

Returns:

list: List of k vertices selected as seeds

Return type:

seeds

graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]

Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.

Parameters:

G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations

Returns:

float: The estimated influence (average number of influenced nodes)
iterations: int: The number of iterations run

Return type:

influence

graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]

Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.

Returns:: the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
Return type:: seeds

graphem.index module

A JAX-based implementation of efficient k-nearest neighbors.

class graphem.index.HPIndex[source]

Bases: object

A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.

__init__()[source]

static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]

Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.

Parameters:

x – (n, d) array of database points
y – (m, d) array of query points
k – number of nearest neighbors
x_tile_size – size of database tiles
y_batch_size – size of query batches

Returns:

(m, k) array of indices of nearest neighbors

graphem.datasets module

Real-world dataset loader for Graphem.

This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.

graphem.datasets.get_data_directory()[source]

Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.

Returns:: Path to the data directory
Return type:: Path

graphem.datasets.download_file(url, filepath, description=None)[source]

Download a file with progress bar.

Parameters:

url – str URL to download
filepath – Path or str Path to save the downloaded file
description – str, optional Description for the progress bar

graphem.datasets.extract_file(filepath, extract_dir=None)[source]

Extract a compressed file.

Parameters:

filepath – Path or str Path to the compressed file
extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.

Returns:

Path to the extraction directory

Return type:

Path

class graphem.datasets.DatasetLoader(name)[source]

Bases: object

Base class for dataset loaders.

__init__(name)[source]

Initialize the dataset loader.

Parameters:: name – str Name of the dataset

download()[source]: Download the dataset. To be implemented by subclasses.

load()[source]: Load the dataset as edges. To be implemented by subclasses.

load_as_networkx()[source]

Load the dataset as a NetworkX graph.

Returns:: The loaded graph
Return type:: networkx.Graph

info()[source]: Print information about the dataset.

is_downloaded()[source]

Check if the dataset is already downloaded.

Returns:: True if downloaded, False otherwise
Return type:: bool

class graphem.datasets.SNAPDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Stanford Network Analysis Project (SNAP).

SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/

AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}

__init__(dataset_name)[source]

Initialize the SNAP dataset loader.

Parameters:: dataset_name – str Name of the SNAP dataset to load

download()[source]: Download the SNAP dataset.

is_downloaded()[source]: Check if the dataset is already downloaded.

load()[source]

Load the SNAP dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]

Bases: DatasetLoader

Loader for datasets from the Network Repository.

Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/

AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}

__init__(dataset_name)[source]

Initialize the Network Repository dataset loader.

Parameters:: dataset_name – str Name of the Network Repository dataset to load

download()[source]: Download the Network Repository dataset.

is_downloaded()[source]

Check if the expected dataset file exists without scanning the whole tree.

Returns:: True if the expected file exists, False otherwise
Return type:: bool

load()[source]

Load the Network Repository dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]

Bases: DatasetLoader

Loader for Semantic Scholar citation network datasets.

Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.

AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}

__init__(dataset_name='s2-CS')[source]

Initialize the Semantic Scholar dataset loader.

Parameters:: dataset_name – str Name of the Semantic Scholar dataset to load

download()[source]: Download the Semantic Scholar dataset.

is_downloaded()[source]: Check if the dataset is already downloaded.

load()[source]

Load the Semantic Scholar dataset as edges.

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.list_available_datasets()[source]

List all available datasets from all sources.

Returns:: Dictionary with dataset information
Return type:: dict

graphem.datasets.load_dataset(dataset_name)[source]

Load a dataset by name.

Parameters:

dataset_name – str Name of the dataset to load

Returns:

(vertices, edges): vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)

Return type:

tuple

graphem.datasets.load_dataset_as_networkx(dataset_name)[source]

Load a dataset as a NetworkX graph.

Parameters:: dataset_name – str Name of the dataset to load
Returns:: The loaded graph
Return type:: networkx.Graph

graphem.visualization module

Visualization utilities for Graphem.

graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]

Calculate and report the correlation between radial distances and a centrality measure.

Parameters:

name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval

Returns:

(correlation coefficient, p-value)

Return type:

tuple

graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]

Calculate and report correlations between radial distances and various centrality measures.

Parameters:

radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval

Returns:

Correlation matrix

Return type:

pandas.DataFrame

graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]

Plot scatter plots of radial distances vs. various centrality measures.

Parameters:

radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures

graphem.visualization.display_benchmark_results(benchmark_results)[source]

Display benchmark results in a nicely formatted table.

Parameters:: benchmark_results – list of dict List of benchmark result dictionaries

graphem.benchmark module

Benchmark functionality for Graphem.

graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark on the given graph.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
dim – int Embedding dimension
L_min – float GraphEmbedder parameters
k_attr – float GraphEmbedder parameters
k_inter – float GraphEmbedder parameters
knn_k – float GraphEmbedder parameters
sample_size – int Batch parameters for kNN search
batch_size – int Batch parameters for kNN search
num_iterations – int Number of layout iterations

Returns:

Benchmark results including timings and graph metrics

Return type:

dict

graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]

Run a benchmark to calculate correlations between embedding radii and centrality measures.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
parameters (Other) – same as run_benchmark

Returns:

Benchmark results with correlations

Return type:

dict

graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]

Run a benchmark comparing influence maximization methods.

Parameters:

graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
k – int Number of seed nodes to select
p – float Propagation probability
iterations – int Number of iterations for influence simulation
dim – int Embedding dimension
num_layout_iterations – int Number of iterations for layout algorithm
layout_params – dict Parameters for the layout algorithm

Returns:

Benchmark results comparing influence maximization methods

Return type:

dict