Cross-Entropy Loss

A classification loss function that measures how surprised the model is by the true label, heavily penalizing confident wrong predictions.

Cross-entropy loss quantifies the divergence between a model's predicted probability distribution and the actual labels. For binary classification, the formula is -(y×log(p) + (1-y)×log(1-p)). The logarithmic structure means confident wrong predictions are punished exponentially harder than uncertain ones, encouraging models to express appropriate uncertainty. Large language models use cross-entropy across their full vocabulary, with perplexity (2^CrossEntropy) serving as an interpretable metric.

Also known as