Quick Start Guide

This guide will get you up and running with GraphEm in just a few minutes.

Installation

Install GraphEm using pip:

pip install graphem-jax

For GPU/TPU acceleration (optional but recommended for large graphs), see the JAX installation guide.

Your First Graph Embedding

Let’s start with a simple example of embedding a random graph:

import graphem as ge
import numpy as np

# Generate a random graph
edges = ge.erdos_renyi_graph(n=200, p=0.05)

# Create an embedder
embedder = ge.GraphEmbedder(
    edges=edges,
    n_vertices=200,
    dimension=3,      # 3D embedding
    L_min=10.0,       # Minimum edge length
    k_attr=0.5,       # Attraction force
    k_inter=0.1,      # Repulsion force
    knn_k=15          # Nearest neighbors
)

# Compute the embedding
embedder.run_layout(num_iterations=40)

# Visualize the result
embedder.display_layout(edge_width=0.5, node_size=5)

Understanding the Parameters

  • dimension: Embedding space dimension (2D or 3D)

  • L_min: Controls minimum distance between connected nodes

  • k_attr: Strength of attractive forces between connected nodes

  • k_inter: Strength of repulsive forces between all nodes

  • knn_k: Number of nearest neighbors for efficient force computation

Graph Generation

GraphEm provides various graph generators:

# Scale-free network (Barabási–Albert)
edges = ge.generate_ba(n=500, m=3)

# Small-world network (Watts–Strogatz)
edges = ge.generate_ws(n=500, k=6, p=0.1)

# Stochastic block model
edges = ge.generate_sbm(n_per_block=100, num_blocks=3, p_in=0.1, p_out=0.01)

# Random regular graph
edges = ge.generate_random_regular(n=300, d=4)

Working with Real Data

Load and analyze real-world networks:

# Load a dataset (includes several network datasets)
vertices, edges = ge.load_dataset('snap-ca-GrQc')  # Collaboration network
n_vertices = len(vertices)

# Create embedder for larger networks
embedder = ge.GraphEmbedder(
    edges=edges,
    n_vertices=n_vertices,
    dimension=2,
    knn_k=20,           # More neighbors for denser graphs
    sample_size=512,    # Larger sample for accuracy
    batch_size=2048     # Larger batches for efficiency
)

embedder.run_layout(num_iterations=100)
embedder.display_layout()

Influence Maximization

Find the most influential nodes in a network:

import networkx as nx

# Convert to NetworkX for influence analysis
G = nx.Graph()
G.add_nodes_from(range(n_vertices))
G.add_edges_from(edges)

# Method 1: GraphEm-based selection (uses embedding)
seeds_graphem = ge.graphem_seed_selection(embedder, k=10, num_iterations=20)

# Method 2: Greedy selection (traditional approach)
seeds_greedy = ge.greedy_seed_selection(G, k=10)

# Estimate influence spread
influence, iterations = ge.ndlib_estimated_influence(
    G, seeds_graphem, p=0.1, iterations_count=200
)

print(f"GraphEm method: {influence} nodes influenced ({influence/n_vertices:.2%})")

Benchmarking and Analysis

Compare different centrality measures:

from graphem.benchmark import benchmark_correlations
from graphem.visualization import report_full_correlation_matrix

# Run comprehensive benchmark
results = benchmark_correlations(
    graph_generator=ge.generate_ba,
    graph_params={'n': 300, 'm': 3},
    dim=3,
    num_iterations=50
)

# Display correlation matrix
correlation_matrix = report_full_correlation_matrix(
    results['radii'],           # Embedding-based centrality
    results['degree'],          # Degree centrality
    results['betweenness'],     # Betweenness centrality
    results['eigenvector'],     # Eigenvector centrality
    results['pagerank'],        # PageRank
    results['closeness'],       # Closeness centrality
    results['node_load']        # Load centrality
)

Performance Tips

For Large Graphs (>10k nodes):

embedder = ge.GraphEmbedder(
    edges=edges,
    n_vertices=n_vertices,
    dimension=2,           # 2D is faster than 3D
    knn_k=10,             # Fewer neighbors = faster
    sample_size=256,      # Smaller samples = faster
    batch_size=4096,      # Larger batches = more efficient
    verbose=False         # Disable progress bars
)

GPU Acceleration:

GraphEm automatically uses GPU if JAX detects CUDA:

import jax
print("Available devices:", jax.devices())  # Check for GPU

# Force CPU usage if needed
with jax.default_device(jax.devices('cpu')[0]):
    embedder.run_layout(num_iterations=50)

Memory Management:

For very large graphs, process in chunks:

# For graphs with >100k nodes, consider reducing parameters
embedder = ge.GraphEmbedder(
    edges=edges,
    n_vertices=n_vertices,
    knn_k=5,              # Minimum viable k
    sample_size=128,      # Smaller sample
    batch_size=1024       # Smaller batches
)

Next Steps