Video description:

We reproduce the GPT-2 (124M) from scratch.

This video covers the whole process:

First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations.

Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

  • @j4k3
    link
    English
    26 months ago

    Interesting concept. I’ll have to watch this later.

    I want to know where present alignment comes from and its developmental history if anyone knows the papers or has a solid reference that is higher level than graduate to doctorate level reading/watching. I mean the persistent entities like Socrates, realms like The Academy, the first 256 special token characters used by Soc (along with the others), and how the keyword token system functions like how Soc uses the word “cross” to build momentum to the word “chuckle” which triggers its dark sophist entity phase. I want to know how all of those special functions are implemented in training.