Graph Machine Learning: Node Embeddings, Classification & Link Prediction

A comprehensive implementation of graph machine learning techniques including shallow and deep node embeddings, node classification with Graph Attention Networks (GAT), and link prediction using Graph Autoencoders (GAE).

Overview

This project demonstrates three core graph ML tasks:

Node Embeddings - Comparing shallow (Node2Vec) vs deep (GNN-based) embedding methods
Node Classification - Semi-supervised classification using GAT with hyperparameter optimization
Link Prediction - Reconstructing missing edges using Graph Autoencoders

Results Summary

Task	Method	Metric	Score
Node Embedding (Deep)	GCN Autoencoder	ROC-AUC	90.21%
Node Classification	GAT + Optuna	Test Accuracy	84.30%
Node Classification	GAT + Optuna	Macro F1	83.06%
Link Prediction	GAT Autoencoder	ROC-AUC	93.66%

Datasets

Hamsterster Social Network (Part A)

A social network dataset representing friendships between users of the Hamsterster website.

Property	Value
Nodes	2,426
Edges	16,630
Type	Undirected

Cora Citation Network (Parts B & C)

A benchmark citation network where nodes are scientific papers and edges represent citations.

Property	Value
Nodes	2,708
Edges	10,556
Features	1,433 (bag-of-words)
Classes	7

Part A: Node Embeddings

Shallow Embedding: Node2Vec

Node2Vec learns node representations by optimizing a skip-gram objective over random walks.

Architecture & Parameters:

Embedding Dimension: 128
Walk Length: 15
Context Size: 10
Walks per Node: 10
Negative Samples: 5
p, q: 1.0, 1.0 (unbiased random walk)
Epochs: 50

Results:

Final embedding shape: (2426, 128)
Training time: ~24 minutes
Loss converged from 8.91 → 0.99

Deep Embedding: GCN-based Graph Autoencoder

A Graph Convolutional Network encoder learns embeddings by reconstructing the adjacency matrix.

Architecture:

Input → GCN (hidden=128) → ReLU → Dropout(0.5) → GCN (out=64) → Embeddings

Results:

Final embedding shape: (2426, 64)
Test ROC-AUC: 90.21%
Early stopping at epoch 109 (best at epoch 89)

Embedding Comparison

Metric	Value
Mean cosine similarity (per node)	-0.12
Top-10 neighbor overlap	14.22%

The low correlation indicates that shallow and deep methods capture fundamentally different structural properties.

Part B: Node Classification (Cora)

Model: Graph Attention Network (GAT)

GAT uses attention mechanisms to weigh neighbor contributions during message passing.

Architecture:

Input (1433) → GAT Layer 1 (heads=10, hidden=16) → ELU → Dropout
            → GAT Layer 2 (heads=1, out=7) → Softmax

Hyperparameter Optimization

Used Optuna with 30 trials for Bayesian optimization:

Parameter	Search Space	Best Value
Learning Rate	[1e-4, 1e-2]	0.00987
Weight Decay	[1e-6, 1e-3]	0.000106
Hidden Dim	[8, 16, 32, 64]	16
Attention Heads	[4, 6, 8, 10, 12]	10
Dropout	[0.3, 0.7]	0.588

Training Strategy

Optuna search on train/val split
Early stopping with patience=40
Retrain on combined train+val with best hyperparameters
Final evaluation on held-out test set

Results

Metric	Score
Best Validation Accuracy	82.00%
Final Test Accuracy	84.30%
Macro F1-Score	83.06%

Part C: Link Prediction (Cora)

Task Setup

Removed 10% of edges (527 edges) for testing
Validation set: 263 edges
Training set: 8,976 edges

Model: GAT-based Graph Autoencoder

Encoder Architecture:

Input (1433) → GAT Layer 1 (hidden=128, heads=4) → ELU → Dropout(0.3)
            → GAT Layer 2 (latent=64, heads=4) → Node Embeddings

Decoder: Inner product between node embeddings to predict edge probability

Training Configuration

Epochs: 300 (with early stopping)
Patience: 40
Learning Rate: 1e-3
Weight Decay: 5e-5
Hidden Dim: 128
Latent Dim: 64

Results

Metric	Score
Best Validation AUC	94.69%
Train AUC @ best	98.48%
Final Test ROC-AUC	93.66%

Training completed in 146 seconds with early stopping at epoch 237 (best at epoch 197).

Visualizations

The project includes several visualizations:

Network visualization with Louvain community detection
t-SNE/UMAP projections of node embeddings
Training curves (loss and AUC over epochs)
ROC curves for link prediction
Precision-Recall curves
Confusion matrix for node classification

Installation

pip install torch torchvision torchaudio
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric
pip install node2vec umap-learn scikit-learn matplotlib networkx
pip install optuna python-louvain

Usage

# Run in Google Colab or Jupyter Notebook
# The notebook is self-contained and downloads datasets automatically

# Key imports
from torch_geometric.nn import GATConv, GCNConv, GAE
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import train_test_split_edges

Project Structure

├── graph_ml_project.ipynb    # Main notebook with all implementations
├── README.md                  # This file
└── data/
    ├── Cora/                  # Cora dataset (auto-downloaded)
    └── soc-hamsterster.edges  # Hamsterster network

Key Techniques Demonstrated

Random Walk Embeddings (Node2Vec with configurable p, q)
Graph Convolutional Networks (GCN for encoding)
Graph Attention Networks (GAT with multi-head attention)
Graph Autoencoders (GAE for unsupervised learning)
Hyperparameter Optimization (Optuna with pruning)
Early Stopping (validation-based checkpointing)
Community Detection (Louvain algorithm)
Dimensionality Reduction (t-SNE, UMAP, PCA)

References

Requirements

Python 3.10+
PyTorch 2.0+
PyTorch Geometric 2.0+
CUDA (optional, for GPU acceleration)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
GNN_model.ipynb		GNN_model.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph Machine Learning: Node Embeddings, Classification & Link Prediction

Overview

Results Summary

Datasets

Hamsterster Social Network (Part A)

Cora Citation Network (Parts B & C)

Part A: Node Embeddings

Shallow Embedding: Node2Vec

Deep Embedding: GCN-based Graph Autoencoder

Embedding Comparison

Part B: Node Classification (Cora)

Model: Graph Attention Network (GAT)

Hyperparameter Optimization

Training Strategy

Results

Part C: Link Prediction (Cora)

Task Setup

Model: GAT-based Graph Autoencoder

Training Configuration

Results

Visualizations

Installation

Usage

Project Structure

Key Techniques Demonstrated

References

Requirements

About

Uh oh!

Releases

Packages

Languages

Illumanizer/gnn-node-embedding-and-link-prediction

Folders and files

Latest commit

History

Repository files navigation

Graph Machine Learning: Node Embeddings, Classification & Link Prediction

Overview

Results Summary

Datasets

Hamsterster Social Network (Part A)

Cora Citation Network (Parts B & C)

Part A: Node Embeddings

Shallow Embedding: Node2Vec

Deep Embedding: GCN-based Graph Autoencoder

Embedding Comparison

Part B: Node Classification (Cora)

Model: Graph Attention Network (GAT)

Hyperparameter Optimization

Training Strategy

Results

Part C: Link Prediction (Cora)

Task Setup

Model: GAT-based Graph Autoencoder

Training Configuration

Results

Visualizations

Installation

Usage

Project Structure

Key Techniques Demonstrated

References

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages