Review — Generative AI: Working with Large Language Models

This content is 3 years old and may not reflect reality today nor the author’s current opinion. Please keep its age in mind as you read it.

I recently reviewed “Generative AI: Working with Large Language Models” by Jonathan Fernandes, which provided a solid overview of the evolution of popular large language models. While the course material is already a bit outdated given advances like OpenAI’s GPT-3.5 and GPT-4 that have come out since — the author subsequently released a separate course on the latter, “GPT-4: The New GPT Release and What You Need to Know” — you should still find it to be a useful primer on how we got from the seminal transformer paper in 2017 (“Attention is All You Need”) to today’s state-of-the-art, particularly if you’re at least a little technical.

Some key takeaways include:

How transfer learning enables models to get broad language exposure through pre-training on massive datasets, then specialise with minimal labeled data through fine-tuning. This is a clever approach and key for #enterprise adoption of #AI.
Big Tech strategies to train “small” large language models like Google’s BERT (and Hugging Face’s relatively svelte DistilBERT) at one end of the spectrum, or go “large” with Microsoft/NVIDIA’s Megatron-Turing NLG model at the other; not surprising when you’re the company making the chips, but the results may not be what you expect!
Innovations like sparse activations (GLaM) and parallelism (PaLM) that allow training massive models without unsustainable power demands. Relevant for those deploying Generative AI in production.
The concept of scaling laws showing that you get better performance by increasing model size, data size, and compute together. Also assumptions about same that proved to be incorrect.
How some models may have been over-sized; Google DeepMind’s Chinchilla showed you can surpass larger under-trained models with less parameters and more training data!
Efforts like OPT and BLOOM open-sourcing these large models so more researchers can experiment. Important for democratising AI, and more open than the likes of Meta’s LLaMA (making models, weights, and/or data sets available).

For anyone looking to get up to speed on the evolution of large language models, I’d recommend this course as a solid starting point. If nothing else it’s interesting to see how we got to where we are today.

Please do share any other solid artificial intelligence resources my friends & followers may find useful.