Some notes on how transformer-decoder language models work, taking GPT-2 as an example, and with lots references in order to dig deeper.
↧
Some notes on how transformer-decoder language models work, taking GPT-2 as an example, and with lots references in order to dig deeper.