In Lesson 1 you built one neuron. A real network — the kind you'll train at the end of this course — uses thousands, arranged in layers. Here's the relieving secret: a layer contains zero new math. It's the Lesson-1 neuron copy-pasted, plus a tidier way to write the pile down. Master that notation today and every ML paper and library suddenly reads like code.
Picture your Lesson-1 neuron: inputs in, weighted sum z, squash with σ, activation out. Now put a second neuron right next to it and feed it the exact same inputs. That pair is a layer: a group of neurons that all look at the same inputs, but each has its own weights and its own bias — so each computes its own activation. One neuron might fire on these inputs while its neighbor stays dark. Why bother? One sigmoid neuron outputs one number — one opinion. A layer outputs a whole list of opinions about the same inputs, and that list becomes the input for the next layer. Stacking layers is Lesson 4; today we nail one layer cold.
Below is a layer of 2 neurons reading 3 shared inputs. Every box is editable. Each output line shows the full arithmetic, term by term — exactly the Lesson-1 recipe, once per neuron. Hover over a weights row (or a result) to see which neuron it belongs to. Try making neuron 2 fire harder than neuron 1.
Lock in the widget's starting numbers on paper — this is today's win. Inputs x = [0.5, −1.0, 2.0]. Neuron 1 has weights [1.0, −2.0, 0.5] and bias −1.5. Pure Lesson-1 arithmetic:
Neuron 2 reads the same x but owns different weights [0.5, 1.0, −1.0] and bias 0.0:
So the layer turns the list [0.5, −1.0, 2.0] into the list [0.881, 0.060]: neuron 1 lights up, neuron 2 stays dark. Three inputs in, two activations out. That's all a layer ever does.
The whole layer is one loop around your Lesson-1 neuron: for each neuron, take its weight row, dot it with x, add its bias, squash. Paste the Python into Google Colab and run it — you should see [0.8807970779778823, 0.060086650174007626], the same 0.881 and 0.060 from your hand computation above. The Swift runs as-is in a playground.
import Foundation
func sigmoid(_ z: Double) -> Double {
1 / (1 + exp(-z))
}
func layer(x: [Double], W: [[Double]], b: [Double]) -> [Double] {
// one activation per row of W: dot(row, x) + bias, then squash
var a: [Double] = []
for n in 0..<W.count {
var z = b[n]
for i in 0..<x.count {
z += W[n][i] * x[i]
}
a.append(sigmoid(z))
}
return a
}
let x = [0.5, -1.0, 2.0]
let W = [[1.0, -2.0, 0.5], // neuron 1's weights
[0.5, 1.0, -1.0]] // neuron 2's weights
let b = [-1.5, 0.0]
print(layer(x: x, W: W, b: b))
// [0.8807970779778823, 0.060086650174007626]
import math
def sigmoid(z):
return 1 / (1 + math.exp(-z))
def layer(x, W, b):
# one activation per row of W: dot(row, x) + bias, then squash
a = []
for row, bias in zip(W, b):
z = bias
for w, xi in zip(row, x):
z += w * xi
a.append(sigmoid(z))
return a
x = [0.5, -1.0, 2.0]
W = [[1.0, -2.0, 0.5], # neuron 1's weights
[0.5, 1.0, -1.0]] # neuron 2's weights
b = [-1.5, 0.0]
print(layer(x, W, b))
# [0.8807970779778823, 0.060086650174007626]
Look at W in the code: it's an array of arrays — [[Double]]. Each inner array is one neuron's weights. You just met a matrix: a grid of numbers, here 2 rows × 3 columns. That's the only new object today, and you've already programmed with it a hundred times.
Same trick as Lesson 1: one idea, three levels of shorthand. The only news is bookkeeping — with several neurons, each weight needs two labels: w₂₃ means "neuron 2's weight for input 3" — first index picks the neuron (the row), second picks the input (the column). That row-then-column order is the standard convention (Nielsen, ch. 2).
Two copies of the Lesson-1 formula, one per neuron. Note what's shared and what isn't: the x's appear in both lines (same inputs!), but every w and b belongs to one neuron only.
Writing a line per neuron dies at 100 neurons. So: pick any neuron and call its number j (the outer loop), and let k run over the inputs (the inner loop — your Lesson-1 Σ):
Read it as nested loops: for each neuron j: for each input k, add w[j][k]·x[k]; then add b[j]; squash. That is literally the double for in the code above — this exact equation, with the same j,k indexing, is Equation 23 in Nielsen's chapter 2.
Now the form every paper and every library uses. Stack the weight rows into the matrix W, the biases into a vector b, and write the entire layer in five symbols:
Symbol by symbol: x is the input vector (a plain list, your [Double]); W is the weight matrix, one row per neuron, one column per input; b is the bias vector, one entry per neuron; and σ acts element-wise — a new term meaning "apply it to each entry of the list separately", so σ([2.0, −2.75]) = [σ(2.0), σ(−2.75)]. Both 3Blue1Brown ("each row of this matrix corresponds to all the connections between neurons in the first layer and a particular neuron in the next layer") and Nielsen (Equation 25) write a layer exactly this way. With our numbers:
And here is the entire mystery of matrix·vector multiplication, demystified in one sentence: output entry j of W·x is the dot product of row j with x — i.e. one Lesson-1 neuron. A matrix times a vector isn't a new operation; it's "run every row's neuron" written as a single symbol. When you read W·x + b in a paper from now on, you should hear: a layer of neurons, each row taking its weighted sum.
No peeking back. Pull it from memory.
Primary source: "But what is a Neural Network?" by 3Blue1Brown — yes, the same video as Lesson 1, but watch it again now and notice the part you skimmed past before: the moment the weights organize into a matrix and the whole layer collapses into σ(W·a + b). It will land completely differently this time. For a text version of today's notation, the "warm up" section of Nielsen's chapter 2 covers exactly these equations.