A continuous depth language model framework in Julia, implementing Neural ODE Transformers, custom continuous attention integrators, reversible depth architectures, adjoint-based training, and efficient KV-cached inference.
machine-learning natural-language-processing research deep-learning julia text-generation transformer neural-networks tensorboard differential-equations language-model continuous-time neural-ode sciml kv-cache autoregressive-generation flux-jl continuous-depth adjoint-methods
-
Updated
Dec 13, 2025 - Julia