AI Foundations · Lesson 19 · Phase 5 — Under the hood of GPT

The Practitioner's Toolbox

Eighteen lessons taught you the machine. This one hands you the workshop: which languages the field actually speaks, what a notebook is, why every error message mentions CUDA, where the world keeps its models, and exactly what to install for each kind of work — including the setup that keeps your Mac completely clean. Nothing here is theory; it's the map you need the day you do any of this for real.

The languages: why Python — and where Swift fits

Python didn't win because it's a great language; it won because of gravity: every library, every paper's code, every tutorial landed there, and each arrival made the next one more inevitable. But here is the part most beginners never learn: Python is the steering wheel, not the engine. PyTorch is written in Python, C++, and CUDA (Wikipedia) — the Python you type is a thin, friendly control panel over compiled C++/GPU code doing lesson 3's matrix multiplications at ferocious speed. That's why "Python is slow" doesn't matter here: the heavy lifting never runs in Python.

And your home language? Honest status report for a Swift developer: training and research speak Python — that won't change soon, and it's why this book taught every concept in both languages (if the Python side still reads like fog, the Python-for-Swift-developers reference covers 100% of the book's Python through Swift eyes). But on the shipping side, Swift is a first-class citizen: Core ML runs trained models inside your iOS/macOS apps, and Apple's own research framework MLX — "an array framework for machine learning on Apple silicon, brought to you by Apple machine learning research" — ships "NumPy-inspired Python APIs with C++, C, and Swift equivalents" (MLX repo). The realistic division of labor: explore and train in Python, ship to users in Swift.

The notebook: the field's favorite workbench

ML people rarely work in Xcode-style projects; they work in notebooks. A Jupyter Notebook is "the original web application for creating and sharing computational documents" — a single document holding "code, narrative text, equations, and rich output", run cell by cell, with results (numbers, plots, tables) appearing inline. It fits ML's rhythm perfectly: load data, peek at it, tweak, re-run one cell — no rebuild, no relaunch. And now the magic door this book kept using can be named precisely: Google Colab is "a hosted Jupyter Notebook service that requires no setup to use and provides free of charge access to computing resources, including GPUs and TPUs" (Colab FAQ). Someone else's Jupyter, someone else's GPU, your browser.

The hardware layer: why "GPU" and "CUDA" are everywhere

Lesson 17 said it in passing; now make it explicit. Training and inference are overwhelmingly matrix multiplications (lesson 3), and a GPU is a machine for doing thousands of small multiply-adds simultaneously. The software that lets ML frameworks drive Nvidia GPUs is CUDA — "a proprietary parallel computing platform and application programming interface developed by… Nvidia" that runs only on Nvidia hardware (Wikipedia). That single proprietary fact explains half the industry's shape: why Nvidia sells GPUs faster than it can make them, and why "do you have CUDA?" is the first question in every ML setup guide.

No Nvidia card in your Mac — but you're not locked out: PyTorch has an MPS backend ("GPU-accelerated PyTorch training on Mac" using Apple's Metal Performance Shaders) that runs on your Apple-Silicon GPU, and MLX above is built natively for it. Toy and small-model work runs happily on a Mac; serious training rents Nvidia time in the cloud.

The model shelf: Hugging Face, Ollama, llama.cpp

You will not train GPT-5 — but you don't have to, because the world shares. The Hugging Face Hub is the field's GitHub: it "hosts over 2M models, 1.5M datasets, and 1.5M AI apps", openly downloadable, with the transformers Python library to load them in a few lines. For running open LLMs on your own machine, two names: llama.cpp, whose stated goal is "to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware — locally and in the cloud" (with Apple Silicon as a first-class target via Metal), and Ollama, the friendly one-command wrapper around that idea — "the easiest way to build with open models", install one app and pull a model like pulling a Docker image. Everything you learned applies directly: what these tools download is a file of weights (lessons 9–10's learned numbers, billions of them), and what they run is lesson 17's forward pass.

The four approaches — what people actually do

approach	what it means	realistic on…
1. Call an API	someone else's trained model over the network (OpenAI, Anthropic…)	any machine; costs per token
2. Run open models	download weights, run lesson 17's forward pass locally (Ollama / llama.cpp)	your Mac, today
3. Finetune	lesson 18's second act on your own data, on top of someone's pretrained weights	small models: one good GPU / Colab; bigger: rented cloud GPUs
4. Pretrain from scratch	lessons 9–18 at full scale	toy scale: you did it in lesson 10! real scale: data centers, millions of dollars

Most practitioners live in rows 1–3. Knowing rows 1–4 are the same machinery at different scales — that's what eighteen lessons bought you.

What to install: the three setups

Terminal commands this time, not Swift/Python. Pick your tab:

# Setup 0 — what this book used. Install: nothing.
#
#   1. open  colab.research.google.com
#   2. File → New notebook
#   3. paste code, press ⌘/Ctrl+Enter to run a cell
#
# torch is preinstalled; free GPU: Runtime → Change runtime type → GPU.
# Perfect for: every lesson here, experiments, finetuning small models.

# Setup 1 — local work that leaves the Mac spotless.
# Everything lives in ONE folder via a virtual environment —
# venv creates "lightweight virtual environments", each isolated,
# with its own packages (docs.python.org). Nothing global. Ever.

cd my-ml-project
python3 -m venv .venv            # create the isolated environment
source .venv/bin/activate        # enter it (prompt changes)
pip install torch jupyter        # installs ONLY into .venv/
jupyter notebook                 # your own local Colab, in the browser

# leaving and cleaning up — total removal, no traces:
deactivate
rm -rf .venv

# Setup 2 — when models outgrow the laptop. Two roads:
#
#   a) a box with an Nvidia GPU + CUDA drivers + the same
#      venv/pip/torch routine as Setup 1 — torch then sees the GPU.
#
#   b) rent: cloud GPU notebooks/instances by the hour
#      (Colab Pro and many providers). Same code, faster iron.
#
# Either way the software story doesn't change:
# Python + venv + torch + notebooks. Only the hardware grows.

The honest minimal kit, for the day someone asks what's needed to "do ML": Python 3, venv, PyTorch, Jupyter — four things, all free, all installable inside one disposable folder. Everything else (Hugging Face libraries, Ollama, experiment trackers) joins the same venv when a project actually needs it. Beware the opposite failure: installing twenty tools "to be ready" is how people avoid starting.

Feel it: pick your path

What do you want to do first? Click one — the card shows the exact stack for it.

Check yourself

No peeking back. Pull it from memory.

1. Why doesn't Python's slowness hurt model training?

2. What is Google Colab, precisely?

3. What does a virtual environment (venv) give you?

Read this next

Primary source: the Hugging Face Hub documentation — walk the shelves of the open-model world: models with their model cards (read a few — you now understand every word: parameters, context window, finetuned from…), datasets, and live demo Spaces. Then, when a real project calls, the venv docs and jupyter.org are the canonical references for the local kit.