A comprehensive implementation of graph machine learning techniques including shallow and deep node embeddings, node classification with Graph Attention Networks (GAT), and link prediction using Graph Autoencoders (GAE).
This project demonstrates three core graph ML tasks:
- Node Embeddings - Comparing shallow (Node2Vec) vs deep (GNN-based) embedding methods
- Node Classification - Semi-supervised classification using GAT with hyperparameter optimization
- Link Prediction - Reconstructing missing edges using Graph Autoencoders
| Task | Method | Metric | Score |
|---|---|---|---|
| Node Embedding (Deep) | GCN Autoencoder | ROC-AUC | 90.21% |
| Node Classification | GAT + Optuna | Test Accuracy | 84.30% |
| Node Classification | GAT + Optuna | Macro F1 | 83.06% |
| Link Prediction | GAT Autoencoder | ROC-AUC | 93.66% |
A social network dataset representing friendships between users of the Hamsterster website.
| Property | Value |
|---|---|
| Nodes | 2,426 |
| Edges | 16,630 |
| Type | Undirected |
A benchmark citation network where nodes are scientific papers and edges represent citations.
| Property | Value |
|---|---|
| Nodes | 2,708 |
| Edges | 10,556 |
| Features | 1,433 (bag-of-words) |
| Classes | 7 |
Node2Vec learns node representations by optimizing a skip-gram objective over random walks.
Architecture & Parameters:
Embedding Dimension: 128
Walk Length: 15
Context Size: 10
Walks per Node: 10
Negative Samples: 5
p, q: 1.0, 1.0 (unbiased random walk)
Epochs: 50
Results:
- Final embedding shape:
(2426, 128) - Training time: ~24 minutes
- Loss converged from 8.91 → 0.99
A Graph Convolutional Network encoder learns embeddings by reconstructing the adjacency matrix.
Architecture:
Input → GCN (hidden=128) → ReLU → Dropout(0.5) → GCN (out=64) → Embeddings
Results:
- Final embedding shape:
(2426, 64) - Test ROC-AUC: 90.21%
- Early stopping at epoch 109 (best at epoch 89)
| Metric | Value |
|---|---|
| Mean cosine similarity (per node) | -0.12 |
| Top-10 neighbor overlap | 14.22% |
The low correlation indicates that shallow and deep methods capture fundamentally different structural properties.
GAT uses attention mechanisms to weigh neighbor contributions during message passing.
Architecture:
Input (1433) → GAT Layer 1 (heads=10, hidden=16) → ELU → Dropout
→ GAT Layer 2 (heads=1, out=7) → Softmax
Used Optuna with 30 trials for Bayesian optimization:
| Parameter | Search Space | Best Value |
|---|---|---|
| Learning Rate | [1e-4, 1e-2] | 0.00987 |
| Weight Decay | [1e-6, 1e-3] | 0.000106 |
| Hidden Dim | [8, 16, 32, 64] | 16 |
| Attention Heads | [4, 6, 8, 10, 12] | 10 |
| Dropout | [0.3, 0.7] | 0.588 |
- Optuna search on train/val split
- Early stopping with patience=40
- Retrain on combined train+val with best hyperparameters
- Final evaluation on held-out test set
| Metric | Score |
|---|---|
| Best Validation Accuracy | 82.00% |
| Final Test Accuracy | 84.30% |
| Macro F1-Score | 83.06% |
- Removed 10% of edges (527 edges) for testing
- Validation set: 263 edges
- Training set: 8,976 edges
Encoder Architecture:
Input (1433) → GAT Layer 1 (hidden=128, heads=4) → ELU → Dropout(0.3)
→ GAT Layer 2 (latent=64, heads=4) → Node Embeddings
Decoder: Inner product between node embeddings to predict edge probability
Epochs: 300 (with early stopping)
Patience: 40
Learning Rate: 1e-3
Weight Decay: 5e-5
Hidden Dim: 128
Latent Dim: 64| Metric | Score |
|---|---|
| Best Validation AUC | 94.69% |
| Train AUC @ best | 98.48% |
| Final Test ROC-AUC | 93.66% |
Training completed in 146 seconds with early stopping at epoch 237 (best at epoch 197).
The project includes several visualizations:
- Network visualization with Louvain community detection
- t-SNE/UMAP projections of node embeddings
- Training curves (loss and AUC over epochs)
- ROC curves for link prediction
- Precision-Recall curves
- Confusion matrix for node classification
pip install torch torchvision torchaudio
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric
pip install node2vec umap-learn scikit-learn matplotlib networkx
pip install optuna python-louvain# Run in Google Colab or Jupyter Notebook
# The notebook is self-contained and downloads datasets automatically
# Key imports
from torch_geometric.nn import GATConv, GCNConv, GAE
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import train_test_split_edges├── graph_ml_project.ipynb # Main notebook with all implementations
├── README.md # This file
└── data/
├── Cora/ # Cora dataset (auto-downloaded)
└── soc-hamsterster.edges # Hamsterster network
- Random Walk Embeddings (Node2Vec with configurable p, q)
- Graph Convolutional Networks (GCN for encoding)
- Graph Attention Networks (GAT with multi-head attention)
- Graph Autoencoders (GAE for unsupervised learning)
- Hyperparameter Optimization (Optuna with pruning)
- Early Stopping (validation-based checkpointing)
- Community Detection (Louvain algorithm)
- Dimensionality Reduction (t-SNE, UMAP, PCA)
- Node2Vec: Scalable Feature Learning for Networks
- Semi-Supervised Classification with Graph Convolutional Networks
- Graph Attention Networks
- Variational Graph Auto-Encoders
- Python 3.10+
- PyTorch 2.0+
- PyTorch Geometric 2.0+
- CUDA (optional, for GPU acceleration)