API Reference
This section contains detailed documentation for all GraphEm modules and functions.
Core Classes
GraphEmbedder
graphem.embedder.GraphEmbedder - JAX-based graph embedding using Laplacian embedding with spring forces and intersection avoidance.
Main class for creating graph embeddings. Combines Laplacian eigenvectors with force-directed layout optimization to create visually appealing and structurally meaningful embeddings.
Key methods:
- __init__(edges, n_vertices, dimension=2, ...)
- Initialize embedder with graph structure
- run_layout(num_iterations=100)
- Execute layout algorithm for specified iterations
- display_layout(edge_width=1, node_size=3, ...)
- Visualize embedding using Plotly
- class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]
Bases:
object
A class for embedding graphs using the Laplacian embedding.
- edges
np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.
- n
int Number of vertices in the graph.
- dimension
int Dimension of the embedding.
- L_min
float Minimum length of the spring.
- k_attr
float Attraction force constant.
- k_inter
float Repulsion force constant for intersections.
- knn_k
int Number of nearest neighbors to consider.
- sample_size
int Number of samples for kNN search.
- batch_size
int Batch size for kNN search.
- my_logger
loguru.logger Logger object to use for logging.
- __init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]
Initialize the GraphEmbedder.
- logger
System logger
- static compute_spring_forces(positions, edges, L_min, k_attr)[source]
Compute the spring forces between vertices.
- static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]
Compute the intersection forces between vertices and their nearest neighbors.
- update_positions()[source]
Update the positions of the vertices based on the spring forces and intersection forces.
HPIndex
graphem.index.HPIndex - High-performance k-nearest neighbors search with JAX acceleration.
Efficient batched k-NN implementation for large datasets with memory optimization through tiling.
Key methods:
- knn_tiled(x, y, k=5, ...)
- Find k nearest neighbors with batched processing
- class graphem.index.HPIndex[source]
Bases:
object
A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.
- static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]
Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.
- Parameters:
x – (n, d) array of database points
y – (m, d) array of query points
k – number of nearest neighbors
x_tile_size – size of database tiles
y_batch_size – size of query batches
- Returns:
(m, k) array of indices of nearest neighbors
Graph Generators
graphem.generators - Generate various graph types for testing and experimentation.
Provides NetworkX-based generators for standard graph models including random graphs, scale-free networks, small-world graphs, and more.
Key functions:
- erdos_renyi_graph(n, p)
- Random graph with edge probability p
- generate_sbm(n_per_block, num_blocks, p_in, p_out)
- Stochastic block model
- generate_ba(n, m)
- Barabási-Albert preferential attachment
- generate_ws(n, k, p)
- Watts-Strogatz small-world
- generate_scale_free(n, ...)
- Scale-free network
- generate_geometric(n, radius)
- Random geometric graph
Graph generators for Graphem.
This module provides various functions to generate different types of graphs.
- graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]
Generate a random undirected graph using the Erdős–Rényi G(n, p) model.
- Parameters:
n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.compute_vertex_degrees(n, edges)[source]
Compute the degree of each vertex from the edge list.
- Parameters:
n – number of vertices
edges – array of shape (num_edges, 2)
- Returns:
np.array of shape (n,) with degree of each vertex
- Return type:
degrees
- graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]
Generate a stochastic block model graph.
- Parameters:
n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- labels: np.ndarray of shape (n,) (only if labels=True)
Block labels for each vertex.
- Return type:
edges
- graphem.generators.generate_ba(n=300, m=3, seed=0)[source]
Generate a Barabási-Albert preferential attachment graph.
- Parameters:
n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]
Generate a Watts-Strogatz small-world graph.
- Parameters:
n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]
Generate a powerlaw cluster graph.
- Parameters:
n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_road_network(width=30, height=30)[source]
Generate a 2D grid graph representing a road network.
- Parameters:
width – int Width of the grid.
height – int Height of the grid.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]
Generate a random bipartite graph.
- Parameters:
n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_balanced_tree(r=2, h=10)[source]
Generate a balanced r-ary tree of height h.
- Parameters:
r – int Branching factor of the tree.
h – int Height of the tree.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]
Generate a random regular graph where each node has degree d.
- Parameters:
n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]
Generate a scale-free graph using Holme and Kim algorithm.
- Parameters:
n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]
Generate a random geometric graph in a unit cube.
- Parameters:
n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_caveman(l=10, k=10)[source]
Generate a caveman graph with l cliques of size k.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]
Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
Influence Maximization
graphem.influence - Seed selection algorithms for influence maximization in networks.
Implements GraphEm-based seed selection using radial distances from embedding origin, plus traditional greedy methods with NDlib simulation.
Key functions:
- graphem_seed_selection(embedder, k)
- Select seeds based on radial distances
- greedy_seed_selection(G, k, p)
- Traditional greedy algorithm
- ndlib_estimated_influence(G, seeds, p)
- Evaluate influence using Independent Cascades
Influence maximization functionality for Graphem.
- graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]
Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.
- Parameters:
embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run
- Returns:
- list
List of k vertices selected as seeds
- Return type:
seeds
- graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]
Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.
- Parameters:
G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations
- Returns:
- float
The estimated influence (average number of influenced nodes)
- iterations: int
The number of iterations run
- Return type:
influence
- graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]
Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.
- Returns:
the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
- Return type:
seeds
Dataset Utilities
graphem.datasets - Load real-world network datasets from various sources.
Download and process datasets from SNAP, Network Repository, and Semantic Scholar. Handles automatic downloading, extraction, and format conversion.
Key classes:
- SNAPDataset(dataset_name)
- SNAP collection (Facebook, Wikipedia, arXiv, etc.)
- NetworkRepositoryDataset(dataset_name)
- Network Repository collection
- SemanticScholarDataset(dataset_name)
- Citation networks
Key functions:
- load_dataset(dataset_name)
- Load any dataset by name
- load_dataset_as_networkx(dataset_name)
- Load as NetworkX graph
- list_available_datasets()
- Show all available datasets
Real-world dataset loader for Graphem.
This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.
- graphem.datasets.get_data_directory()[source]
Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.
- Returns:
Path to the data directory
- Return type:
Path
- graphem.datasets.download_file(url, filepath, description=None)[source]
Download a file with progress bar.
- Parameters:
url – str URL to download
filepath – Path or str Path to save the downloaded file
description – str, optional Description for the progress bar
- graphem.datasets.extract_file(filepath, extract_dir=None)[source]
Extract a compressed file.
- Parameters:
filepath – Path or str Path to the compressed file
extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.
- Returns:
Path to the extraction directory
- Return type:
Path
- class graphem.datasets.DatasetLoader(name)[source]
Bases:
object
Base class for dataset loaders.
- class graphem.datasets.SNAPDataset(dataset_name)[source]
Bases:
DatasetLoader
Loader for datasets from the Stanford Network Analysis Project (SNAP).
SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/
- AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}
- class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]
Bases:
DatasetLoader
Loader for datasets from the Network Repository.
Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/
- AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}
- __init__(dataset_name)[source]
Initialize the Network Repository dataset loader.
- Parameters:
dataset_name – str Name of the Network Repository dataset to load
- class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]
Bases:
DatasetLoader
Loader for Semantic Scholar citation network datasets.
Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.
- AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}
- graphem.datasets.list_available_datasets()[source]
List all available datasets from all sources.
- Returns:
Dictionary with dataset information
- Return type:
- graphem.datasets.load_dataset(dataset_name)[source]
Load a dataset by name.
- Parameters:
dataset_name – str Name of the dataset to load
- Returns:
- (vertices, edges)
vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)
- Return type:
Visualization
graphem.visualization - Statistical analysis and plotting utilities for embeddings.
Calculate correlations between embedding radial distances and network centrality measures, with bootstrap confidence intervals.
Key functions:
- report_corr(name, radii, centrality)
- Correlation with confidence intervals
- report_full_correlation_matrix(radii, ...)
- Multiple centrality correlations
- plot_radial_vs_centrality(radii, centralities, names)
- Scatter plots with trendlines
- display_benchmark_results(results)
- Format benchmark output
Visualization utilities for Graphem.
- graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]
Calculate and report the correlation between radial distances and a centrality measure.
- Parameters:
name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval
- Returns:
(correlation coefficient, p-value)
- Return type:
- graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]
Calculate and report correlations between radial distances and various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval
- Returns:
Correlation matrix
- Return type:
pandas.DataFrame
- graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]
Plot scatter plots of radial distances vs. various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures
Benchmarking
graphem.benchmark - Performance evaluation and comparative analysis tools.
Comprehensive benchmarking for embedding quality, centrality correlations, and influence maximization effectiveness.
Key functions:
- run_benchmark(graph_generator, params)
- Basic embedding benchmark
- benchmark_correlations(graph_generator, params)
- Centrality correlation analysis
- run_influence_benchmark(graph_generator, params)
- Compare influence methods
Benchmark functionality for Graphem.
- graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]
Run a benchmark on the given graph.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
dim – int Embedding dimension
L_min – float GraphEmbedder parameters
k_attr – float GraphEmbedder parameters
k_inter – float GraphEmbedder parameters
knn_k – float GraphEmbedder parameters
sample_size – int Batch parameters for kNN search
batch_size – int Batch parameters for kNN search
num_iterations – int Number of layout iterations
- Returns:
Benchmark results including timings and graph metrics
- Return type:
- graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]
Run a benchmark to calculate correlations between embedding radii and centrality measures.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
parameters (Other) – same as run_benchmark
- Returns:
Benchmark results with correlations
- Return type:
- graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]
Run a benchmark comparing influence maximization methods.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
k – int Number of seed nodes to select
p – float Propagation probability
iterations – int Number of iterations for influence simulation
dim – int Embedding dimension
num_layout_iterations – int Number of iterations for layout algorithm
layout_params – dict Parameters for the layout algorithm
- Returns:
Benchmark results comparing influence maximization methods
- Return type:
Complete Module Reference
graphem.embedder module
A JAX-based implementation of graph embedding.
- class graphem.embedder.GraphEmbedder(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]
Bases:
object
A class for embedding graphs using the Laplacian embedding.
- edges
np.ndarray of shape (num_edges, 2) Array of edge pairs (i, j) with i < j.
- n
int Number of vertices in the graph.
- dimension
int Dimension of the embedding.
- L_min
float Minimum length of the spring.
- k_attr
float Attraction force constant.
- k_inter
float Repulsion force constant for intersections.
- knn_k
int Number of nearest neighbors to consider.
- sample_size
int Number of samples for kNN search.
- batch_size
int Batch size for kNN search.
- my_logger
loguru.logger Logger object to use for logging.
- __init__(edges, n_vertices, dimension=2, L_min=1.0, k_attr=0.2, k_inter=0.5, knn_k=10, sample_size=256, batch_size=1024, my_logger=None, verbose=True)[source]
Initialize the GraphEmbedder.
- logger
System logger
- static compute_spring_forces(positions, edges, L_min, k_attr)[source]
Compute the spring forces between vertices.
- static compute_intersection_forces_with_knn_index(positions, edges, knn_idx, sampled_indices, k_inter)[source]
Compute the intersection forces between vertices and their nearest neighbors.
- update_positions()[source]
Update the positions of the vertices based on the spring forces and intersection forces.
graphem.generators module
Graph generators for Graphem.
This module provides various functions to generate different types of graphs.
- graphem.generators.erdos_renyi_graph(n, p, seed=0)[source]
Generate a random undirected graph using the Erdős–Rényi G(n, p) model.
- Parameters:
n – int Number of vertices.
p – float Probability that an edge exists between any pair of vertices.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.compute_vertex_degrees(n, edges)[source]
Compute the degree of each vertex from the edge list.
- Parameters:
n – number of vertices
edges – array of shape (num_edges, 2)
- Returns:
np.array of shape (n,) with degree of each vertex
- Return type:
degrees
- graphem.generators.generate_sbm(n_per_block=75, num_blocks=4, p_in=0.15, p_out=0.01, labels=False, seed=0)[source]
Generate a stochastic block model graph.
- Parameters:
n_per_block – int Number of vertices per block.
num_blocks – int Number of blocks.
p_in – float Probability of edge within a block.
p_out – float Probability of edge between blocks.
labels – bool If True, return vertex labels.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- labels: np.ndarray of shape (n,) (only if labels=True)
Block labels for each vertex.
- Return type:
edges
- graphem.generators.generate_ba(n=300, m=3, seed=0)[source]
Generate a Barabási-Albert preferential attachment graph.
- Parameters:
n – int Number of vertices.
m – int Number of edges to attach from a new vertex to existing vertices.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_ws(n=1000, k=6, p=0.3, seed=0)[source]
Generate a Watts-Strogatz small-world graph.
- Parameters:
n – int Number of vertices.
k – int Each vertex is connected to k nearest neighbors in ring topology.
p – float Probability of rewiring each edge.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_power_cluster(n=1000, m=3, p=0.5, seed=0)[source]
Generate a powerlaw cluster graph.
- Parameters:
n – int Number of vertices.
m – int Number of random edges to add per new vertex.
p – float Probability of adding a triangle after adding a random edge.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_road_network(width=30, height=30)[source]
Generate a 2D grid graph representing a road network.
- Parameters:
width – int Width of the grid.
height – int Height of the grid.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_bipartite_graph(n_top=50, n_bottom=100)[source]
Generate a random bipartite graph.
- Parameters:
n_top – int Number of vertices in the top set.
n_bottom – int Number of vertices in the bottom set.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_balanced_tree(r=2, h=10)[source]
Generate a balanced r-ary tree of height h.
- Parameters:
r – int Branching factor of the tree.
h – int Height of the tree.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_random_regular(n=100, d=3, seed=0)[source]
Generate a random regular graph where each node has degree d.
- Parameters:
n – int Number of vertices.
d – int Degree of each vertex.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_scale_free(n=100, alpha=0.41, beta=0.54, gamma=0.05, delta_in=0.2, delta_out=0, seed=0)[source]
Generate a scale-free graph using Holme and Kim algorithm.
- Parameters:
n – int Number of vertices.
alpha – float Parameters for the scale-free graph generation.
beta – float Parameters for the scale-free graph generation.
gamma – float Parameters for the scale-free graph generation.
delta_in – float Parameters for the scale-free graph generation.
delta_out – float Parameters for the scale-free graph generation.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_geometric(n=100, radius=0.2, dim=2, seed=0)[source]
Generate a random geometric graph in a unit cube.
- Parameters:
n – int Number of vertices.
radius – float Distance threshold for connecting vertices.
dim – int Dimension of the space.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_caveman(l=10, k=10)[source]
Generate a caveman graph with l cliques of size k.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
- graphem.generators.generate_relaxed_caveman(l=10, k=10, p=0.1, seed=0)[source]
Generate a relaxed caveman graph with l cliques of size k, and a rewiring probability p.
- Parameters:
l – int Number of cliques.
k – int Size of each clique.
p – float Rewiring probability.
seed – int Random seed for reproducibility.
- Returns:
- np.ndarray of shape (num_edges, 2)
Array of edge pairs (i, j) with i < j.
- Return type:
edges
graphem.influence module
Influence maximization functionality for Graphem.
- graphem.influence.graphem_seed_selection(embedder, k, num_iterations=20)[source]
Run the GraphEmbedder layout to get an embedding, then select k seeds by choosing the nodes with the highest radial distances.
- Parameters:
embedder – GraphEmbedder The initialized graph embedder object
k – int Number of seed nodes to select
num_iterations – int Number of layout iterations to run
- Returns:
- list
List of k vertices selected as seeds
- Return type:
seeds
- graphem.influence.ndlib_estimated_influence(G, seeds, p=0.1, iterations_count=200)[source]
Run NDlib’s Independent Cascades model on graph G, starting with the given seeds, and return the estimated final influence (number of nodes in state 2) and the number of iterations executed.
- Parameters:
G – networkx.Graph The graph to run influence propagation on
seeds – list The list of seed nodes
p – float Propagation probability
iterations_count – int Maximum number of simulation iterations
- Returns:
- float
The estimated influence (average number of influenced nodes)
- iterations: int
The number of iterations run
- Return type:
influence
- graphem.influence.greedy_seed_selection(G, k, p=0.1, iterations_count=200)[source]
Greedy seed selection using NDlib influence estimation. For each candidate node evaluation, it calls NDlib’s simulation and accumulates the total number of iterations used across all evaluations.
- Returns:
the selected seed set (list of nodes) total_iters: the total number of NDlib iterations run during selection.
- Return type:
seeds
graphem.index module
A JAX-based implementation of efficient k-nearest neighbors.
- class graphem.index.HPIndex[source]
Bases:
object
A kernelized kNN index that uses batching / tiling to efficiently handle large datasets with limited memory usage.
- static knn_tiled(x, y, k=5, x_tile_size=8192, y_batch_size=1024)[source]
Advanced implementation that tiles both database and query points. This wrapper handles the dynamic aspects before calling the JIT-compiled function.
- Parameters:
x – (n, d) array of database points
y – (m, d) array of query points
k – number of nearest neighbors
x_tile_size – size of database tiles
y_batch_size – size of query batches
- Returns:
(m, k) array of indices of nearest neighbors
graphem.datasets module
Real-world dataset loader for Graphem.
This module provides functions to download and load standard large graph datasets from various sources including SNAP (Stanford Network Analysis Project), Network Repository, and other public graph repositories.
- graphem.datasets.get_data_directory()[source]
Get the data directory for storing downloaded datasets. Creates the directory if it doesn’t exist.
- Returns:
Path to the data directory
- Return type:
Path
- graphem.datasets.download_file(url, filepath, description=None)[source]
Download a file with progress bar.
- Parameters:
url – str URL to download
filepath – Path or str Path to save the downloaded file
description – str, optional Description for the progress bar
- graphem.datasets.extract_file(filepath, extract_dir=None)[source]
Extract a compressed file.
- Parameters:
filepath – Path or str Path to the compressed file
extract_dir – Path or str, optional Directory to extract to. If None, extracts to the same directory as the file.
- Returns:
Path to the extraction directory
- Return type:
Path
- class graphem.datasets.DatasetLoader(name)[source]
Bases:
object
Base class for dataset loaders.
- class graphem.datasets.SNAPDataset(dataset_name)[source]
Bases:
DatasetLoader
Loader for datasets from the Stanford Network Analysis Project (SNAP).
SNAP datasets are commonly used in network analysis research. Source: https://snap.stanford.edu/data/
- AVAILABLE_DATASETS = {'ca-GrQc': {'description': 'Collaboration network of Arxiv General Relativity', 'directed': False, 'edges': 14496, 'nodes': 5242, 'url': 'https://snap.stanford.edu/data/ca-GrQc.txt.gz'}, 'ca-HepTh': {'description': 'Collaboration network of Arxiv High Energy Physics Theory', 'directed': False, 'edges': 25998, 'nodes': 9877, 'url': 'https://snap.stanford.edu/data/ca-HepTh.txt.gz'}, 'ego-twitter': {'description': 'Twitter ego network', 'directed': True, 'edges': 1768149, 'nodes': 81306, 'url': 'https://snap.stanford.edu/data/twitter_combined.txt.gz'}, 'email-Enron': {'description': 'Email communication network from Enron', 'directed': True, 'edges': 183831, 'nodes': 36692, 'url': 'https://snap.stanford.edu/data/email-Enron.txt.gz'}, 'facebook_combined': {'description': 'Facebook social network', 'directed': False, 'edges': 88234, 'nodes': 4039, 'url': 'https://snap.stanford.edu/data/facebook_combined.txt.gz'}, 'oregon1_010331': {'description': 'AS peering network from Oregon route views', 'directed': False, 'edges': 22002, 'nodes': 10670, 'url': 'https://snap.stanford.edu/data/oregon1_010331.txt.gz'}, 'p2p-Gnutella04': {'description': 'Gnutella peer-to-peer network from August 4, 2002', 'directed': True, 'edges': 39994, 'nodes': 10876, 'url': 'https://snap.stanford.edu/data/p2p-Gnutella04.txt.gz'}, 'wiki-vote': {'description': 'Wikipedia who-votes-on-whom network', 'directed': True, 'edges': 103689, 'nodes': 7115, 'url': 'https://snap.stanford.edu/data/wiki-Vote.txt.gz'}}
- class graphem.datasets.NetworkRepositoryDataset(dataset_name)[source]
Bases:
DatasetLoader
Loader for datasets from the Network Repository.
Network Repository is a scientific network data repository with interactive analytics. Source: https://networkrepository.com/
- AVAILABLE_DATASETS = {'ca-cit-HepPh': {'description': 'Citation network of Arxiv High Energy Physics', 'directed': True, 'file_pattern': 'ca-cit-HepPh.mtx', 'url': 'https://nrvis.com/download/data/ca/ca-cit-HepPh.zip'}, 'ia-reality': {'description': 'Reality Mining social network', 'directed': False, 'file_pattern': 'ia-reality.mtx', 'url': 'https://nrvis.com/download/data/ia/ia-reality.zip'}, 'soc-hamsterster': {'description': 'Hamsterster social network', 'directed': False, 'file_pattern': 'soc-hamsterster.mtx', 'url': 'https://nrvis.com/download/data/soc/soc-hamsterster.zip'}, 'socfb-MIT': {'description': 'Facebook network from MIT', 'directed': False, 'file_pattern': 'socfb-MIT.mtx', 'url': 'https://nrvis.com/download/data/socfb/socfb-MIT.zip'}, 'web-google-dir': {'description': 'Google web graph', 'directed': True, 'file_pattern': 'web-google-dir.edges', 'url': 'https://nrvis.com/download/data/web/web-google-dir.zip'}}
- __init__(dataset_name)[source]
Initialize the Network Repository dataset loader.
- Parameters:
dataset_name – str Name of the Network Repository dataset to load
- class graphem.datasets.SemanticScholarDataset(dataset_name='s2-CS')[source]
Bases:
DatasetLoader
Loader for Semantic Scholar citation network datasets.
Semantic Scholar is a free, AI-powered research tool for scientific literature. This loader downloads and processes the citation network from subsets of Semantic Scholar data.
- AVAILABLE_DATASETS = {'s2-CS': {'description': 'Computer Science citation network from Semantic Scholar', 'edges_file': 's2-CS-citations.csv', 'nodes_file': 's2-CS-nodes.csv', 'url': 'https://github.com/mattbierbaum/citation-networks/raw/master/s2-CS.tar.gz'}}
- graphem.datasets.list_available_datasets()[source]
List all available datasets from all sources.
- Returns:
Dictionary with dataset information
- Return type:
- graphem.datasets.load_dataset(dataset_name)[source]
Load a dataset by name.
- Parameters:
dataset_name – str Name of the dataset to load
- Returns:
- (vertices, edges)
vertices: np.ndarray of shape (num_vertices,) edges: np.ndarray of shape (num_edges, 2)
- Return type:
graphem.visualization module
Visualization utilities for Graphem.
- graphem.visualization.report_corr(name, radii, centrality, alpha=0.025)[source]
Calculate and report the correlation between radial distances and a centrality measure.
- Parameters:
name – str Name of the centrality measure
radii – array-like Radial distances from origin
centrality – array-like Centrality values
alpha – float Alpha level for confidence interval
- Returns:
(correlation coefficient, p-value)
- Return type:
- graphem.visualization.report_full_correlation_matrix(radii, deg, btw, eig, pr, clo, nload, alpha=0.025)[source]
Calculate and report correlations between radial distances and various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
deg – array-like Various centrality measures
btw – array-like Various centrality measures
eig – array-like Various centrality measures
pr – array-like Various centrality measures
clo – array-like Various centrality measures
edge_btw – array-like Various centrality measures
alpha – float Alpha level for confidence interval
- Returns:
Correlation matrix
- Return type:
pandas.DataFrame
- graphem.visualization.plot_radial_vs_centrality(radii, centralities, names)[source]
Plot scatter plots of radial distances vs. various centrality measures.
- Parameters:
radii – array-like Radial distances from origin
centralities – list of array-like List of centrality measures
names – list of str Names of the centrality measures
graphem.benchmark module
Benchmark functionality for Graphem.
- graphem.benchmark.run_benchmark(graph_generator, graph_params, dim=3, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]
Run a benchmark on the given graph.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
dim – int Embedding dimension
L_min – float GraphEmbedder parameters
k_attr – float GraphEmbedder parameters
k_inter – float GraphEmbedder parameters
knn_k – float GraphEmbedder parameters
sample_size – int Batch parameters for kNN search
batch_size – int Batch parameters for kNN search
num_iterations – int Number of layout iterations
- Returns:
Benchmark results including timings and graph metrics
- Return type:
- graphem.benchmark.benchmark_correlations(graph_generator, graph_params, dim=2, L_min=10.0, k_attr=0.5, k_inter=0.1, knn_k=15, sample_size=512, batch_size=1024, num_iterations=40)[source]
Run a benchmark to calculate correlations between embedding radii and centrality measures.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
parameters (Other) – same as run_benchmark
- Returns:
Benchmark results with correlations
- Return type:
- graphem.benchmark.run_influence_benchmark(graph_generator, graph_params, k=10, p=0.1, iterations=200, dim=3, num_layout_iterations=20, layout_params=None)[source]
Run a benchmark comparing influence maximization methods.
- Parameters:
graph_generator – callable Function to generate a graph
graph_params – dict Parameters for the graph generator
k – int Number of seed nodes to select
p – float Propagation probability
iterations – int Number of iterations for influence simulation
dim – int Embedding dimension
num_layout_iterations – int Number of iterations for layout algorithm
layout_params – dict Parameters for the layout algorithm
- Returns:
Benchmark results comparing influence maximization methods
- Return type: