Following the tutorial on Youtube by Andrej Karpathy: https://www.youtube.com/watch?v=kCc8FmEb1nY
At my own pace, I plan to implement a decoder-only transformer as outlined by the famous paper "Attention is All You Need" (Vaswani et al., 2017) to genereate sensible output based on Shakespeare's complete works using the Tiny Shakespeare dataset.