Reranking

A two-stage retrieval pattern where fast initial search retrieves candidates, then more expensive models like cross-encoders re-score them by actual query relevance.

Reranking improves retrieval precision by using computationally expensive models on a shortlist of candidates rather than the entire index. Cross-encoders like MiniLM score query-passage pairs directly for high accuracy. Late interaction models like ColBERT encode separately but compute token-level similarity for better efficiency. Both approaches significantly improve NDCG and MRR metrics over single-stage retrieval.

Also known as