Gradient Descent

An optimization algorithm that iteratively adjusts model parameters by moving in the direction that most reduces the error function.

Gradient descent is how neural networks find good weight configurations. The gradient is a vector pointing in the direction of steepest increase in error; the algorithm steps in the opposite direction to reduce error. The learning rate controls step size: too large and training oscillates or diverges, too small and training takes forever. Variants like stochastic gradient descent (SGD) and Adam improve efficiency by using random batches and adaptive learning rates.

Also known as

SGD, stochastic gradient descent, gradient optimization