I recently reviewed “Generative AI: Working with Large Language Models” by Jonathan Fernandes, which provided a solid overview of the evolution of popular large language models. While the course material is already a bit outdated given advances like OpenAI’s GPT-3.5 and GPT-4 that have come out since — the author subsequently released a separate course on the latter, “GPT-4: The New GPT Release and What You Need to Know” — you should still find it to be a useful primer on how we got from the seminal transformer paper in 2017 (“Attention is All You Need”) to today’s state-of-the-art, particularly if you’re at least a little technical.
Some key takeaways include:
- How transfer learning enables models to get broad language exposure through pre-training on massive datasets, then specialise with minimal labeled data through fine-tuning. This is a clever approach and key for #enterprise adoption of #AI.
- Big Tech strategies to train “small” large language models like Google’s BERT (and Hugging Face’s relatively svelte DistilBERT) at one end of the spectrum, or go “large” with Microsoft/NVIDIA’s Megatron-Turing NLG model at the other; not surprising when you’re the company making the chips, but the results may not be what you expect!
- Innovations like sparse activations (GLaM) and parallelism (PaLM) that allow training massive models without unsustainable power demands. Relevant for those deploying Generative AI in production.
- The concept of scaling laws showing that you get better performance by increasing model size, data size, and compute together. Also assumptions about same that proved to be incorrect.
- How some models may have been over-sized; Google DeepMind’s Chinchilla showed you can surpass larger under-trained models with less parameters and more training data!
- Efforts like OPT and BLOOM open-sourcing these large models so more researchers can experiment. Important for democratising AI, and more open than the likes of Meta’s LLaMA (making models, weights, and/or data sets available).
For anyone looking to get up to speed on the evolution of large language models, I’d recommend this course as a solid starting point. If nothing else it’s interesting to see how we got to where we are today.
Please do share any other solid artificial intelligence resources my friends & followers may find useful.