README

Training Flow

Load an image from a dataset
Encode it using the VAE encoder. This gives a Latent Tensor $z_{0}$
- If our image was 512x512 RGB, the $z_{0}$ would have the shape (4,64,64)
Apply noise to $z_{0}$ using the noise scheduler: $x_{t} = \sqrt{\alpha_t}z_{0} + \sqrt{1 - \alpha_{t}}{\epsilon}$
Feed $x_t$ at timestep $t$, and text embedding $c$ into the U-Net to predict $\epsilon$
Compute Loss
Backpropagate (update the U-Net weights)

Pipeline samples random Gaussian noise in the latent space
U-Net with learned weights, begins denoising step by step. $z_{t-1} = \text{Scheduler.step}()$
After T steps you get a clean latent $z_{0}$, T approaches 0.
The VAE-decoder then converts the $z_{0}$ into an RGB image $x_{0}$

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
src		src
.gitignore		.gitignore
Align_Your_Steps_usage.ipynb		Align_Your_Steps_usage.ipynb
EDS_FID.ipynb		EDS_FID.ipynb
README.md		README.md
Research299.ipynb		Research299.ipynb
Research299_Pipeline.ipynb		Research299_Pipeline.ipynb
main.py		main.py