GraphEm: Graph Embedding & Influence Maximization

A high-performance graph embedding and influence maximization library powered by JAX, designed for scalable network analysis and research.

Documentation

What is GraphEm?

GraphEm is a comprehensive Python library that combines cutting-edge graph embedding techniques with influence maximization algorithms. Built on JAX for high-performance computation, it enables researchers and practitioners to analyze large-scale networks efficiently.

Key Features

High-Performance Computing

JAX-accelerated computations with GPU/TPU support
JIT compilation for optimized performance
Memory-efficient algorithms for large networks
Batch processing capabilities

Advanced Graph Embedding

Spectral initialization using graph Laplacian
Force-directed layout refinement with customizable forces
2D and 3D embedding support
Hierarchical position indexing for efficient k-NN search

Influence Maximization

Novel embedding-based seed selection algorithm
Traditional greedy algorithm for comparison
NDlib integration for influence spread simulation
Comprehensive benchmarking tools

Graph Generation & Datasets

12+ standard graph models (Erdős–Rényi, Barabási–Albert, Watts–Strogatz, etc.)
Built-in loaders for real-world datasets (SNAP, Network Repository)
Custom graph generators for domain-specific networks

Visualization & Analysis

Interactive 2D/3D plots with Plotly
Centrality correlation analysis
Performance benchmarking tools
Comprehensive reporting utilities

Quick Start Example

import graphem as ge
import networkx as nx

# Generate a scale-free network
edges = ge.generate_ba(n=1000, m=3)

# Create and run embedding
embedder = ge.GraphEmbedder(
    edges=edges,
    n_vertices=1000,
    dimension=3
)
embedder.run_layout(num_iterations=50)

# Find influential nodes
seeds = ge.graphem_seed_selection(embedder, k=20)

# Estimate influence spread
G = nx.Graph(edges)
influence, _ = ge.ndlib_estimated_influence(G, seeds, p=0.1)
print(f"Influence spread: {influence} nodes ({influence/1000:.1%})")

# Visualize results
embedder.display_layout()

Installation

Install GraphEm using pip:

pip install graphem-jax

For GPU/TPU support, see the JAX installation guide.

Core Components

GraphEmbedder

The main embedding engine that combines spectral initialization with iterative force-directed refinement:

Spectral Initialization: Uses graph Laplacian eigenvectors for initial positioning
Force-Directed Layout: Applies spring forces between connected nodes and repulsion between all nodes
Intersection Avoidance: Prevents node overlap for cleaner visualizations
Adaptive Parameters: Automatically adjusts forces based on graph structure

HPIndex

High performance index for efficient k-nearest neighbor search in high-dimensional embeddings:

Memory Efficient: Memory tiling for large point sets
Batch Processing: Vectorized nearest neighbor queries

Influence Maximization

Advanced algorithms for identifying influential nodes in networks:

GraphEm Method: Uses embedding radial distances to select diverse, influential seeds
Greedy Baseline: Traditional greedy algorithm for comparison
Spread Simulation: NDlib integration for accurate influence estimation

Graph Generators

Comprehensive collection of standard and custom graph models:

# Classic models
edges = ge.erdos_renyi_graph(n=500, p=0.02)
edges = ge.generate_ba(n=500, m=3)  # Scale-free
edges = ge.generate_ws(n=500, k=6, p=0.1)  # Small-world

# Community structures
edges = ge.generate_sbm(sizes=[100, 150, 100], p_in=0.1, p_out=0.01)
edges = ge.generate_caveman(clique_size=10, num_cliques=5)

# Specialized networks
edges = ge.generate_geometric(n=300, radius=0.2)
edges = ge.generate_road_network(grid_size=20, connection_prob=0.8)

Performance Characteristics

GraphEm is designed for high performance across different scales:

Small Networks (< 1K nodes)

Real-time embedding and visualization
Interactive parameter tuning
Comprehensive analysis possible

Medium Networks (1K - 10K nodes)

Efficient embedding with optimized parameters
Batch processing for multiple analyses
GPU acceleration recommended

Large Networks (10K+ nodes)

Memory-efficient algorithms
Progressive refinement strategies
Distributed processing capabilities

Benchmarking Results

GraphEm shows promising performance:

Speed: Much faster than the greedy algorithm for node influence maximization
Memory: Memory tiling for efficiency, can process large datasets
Accuracy: Strong correlation with ground-truth centrality measures

License

MIT