JEPA Explainer: The Interactive Explainer
Visit The Project JEPA Explainer
"Generative models struggle to paint every leaf. Predictive models just know it's a tree."
An interactive research visualization exploring the fundamental shift in AI architecture: moving from Generative Reconstruction (Transformers, MAE) to Joint Embedding Prediction (I-JEPA).
Traditional Generative Masked Autoencoders (MAE) work by masking parts of an image and trying to reconstruct the missing pixels exactly.
- The Cost: The model wastes massive compute trying to predict high-frequency noise (e.g., the exact texture of a dog's fur).
- The Flaw: It requires the model to know details that are irrelevant to semantic understanding.
JEPA (Joint Embedding Predictive Architecture), proposed by LeCun et al., abandons pixel reconstruction. Instead, it predicts the representation of the missing information.
- The Efficiency: It predicts abstract features (vectors), not millions of pixels.
- The Result: A model that understands "concepts" and "plans" rather than just hallucinating textures.
A real-time canvas where you play the adversary.
- Generative Mode (Transformer): Watch the model struggle to "denoise" and fill in every pixel of the area you masked.
- Predictive Mode (JEPA): See the model bypass rendering entirely, spawning a glowing "Context Orb" that represents the predicted feature vector with high confidence.
An interactive flowchart breaking down the mathematical difference.
- Click-to-Explore: Drill down into the Encoder, Predictor, and Attention blocks.
- Math Visualization: Hover over variables in the loss equations to see their tensor shapes light up in the diagram.
A WebGL (Threlte/Three.js) visualization of the "Thinking Process."
- Transformer Path: Visualized as a jittery, high-energy path trying to navigate pixel space.
- JEPA Path: Visualized as a smooth, efficient spline traversing the low-dimensional semantic manifold.
This project visualizes the differing loss functions that drive these architectures:
The model minimizes the distance between the original pixels () and the reconstructed pixels (). This is computationally expensive and noise-sensitive.
JEPA minimizes the distance between the target representation () and the predicted representation (). This happens in a low-dimensional abstract space (e.g., 64 dims vs 1M pixels).
This project is built for performance, running entirely client-side (no Python backend) by using pre-computed tensor mocks.
- Framework: SvelteKit (Static Adapter)
- Styling: Tailwind CSS (Academic/Minimalist Theme)
- Math Rendering: KaTeX
- 3D Visualization: Threlte (Three.js for Svelte)
- State Management: Svelte Stores
This project uses Tailwind v3 and Node 18+.
# 1. Clone the repository
git clone https://github.com/linukperera/JEPA-Explainer.git
# 2. Install dependencies
npm install
# 3. Start the research lab
npm run dev
If you use this visualization for education or research, please cite the original I-JEPA paper and this repository:
@article{assran2023self,
title={Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture},
author={Assran, Mahmoud and Duval, Quentin and Misra, Ishan and ... LeCun, Yann},
journal={CVPR},
year={2023}
}
@software{JEPAExplainer2026,
author = {Perera, Linuk},
title = {JEPA Explainer: Interactive Visualization of Joint Embedding Architectures},
url = {https://linukperera.github.io/JEPA-Explainer/},
year = {2026}
}
Built by Linuk Perera