AI Glossary

Key terms and definitions in artificial intelligence.

A

Agent teams: A feature in Claude Code that lets developers spawn specialized sub-agents for different tasks like planning, implementation, and review.
Agentic AI: AI systems designed to take autonomous actions and make decisions, going beyond answering questions to actively guiding users through processes.
AI Mode: Google's chatbot-style search interface that generates AI-powered responses to queries.
AI@HHMI: Howard Hughes Medical Institute's initiative launched in 2024 to integrate AI across its biomedical research programs.
Alignment Tax: The measurable degradation in a model's core capabilities (reading comprehension, translation, reasoning) caused by RLHF safety training.

Byte Pair Encoding: A compression algorithm that iteratively merges the most frequent adjacent character pairs to build a subword vocabulary.

Capex: Capital expenditure; money spent by a company to acquire or upgrade physical assets like data centers and compute infrastructure.
Causal Masking: A constraint in decoder models (like GPT) that prevents each token from attending to future tokens, ensuring the model generates text left-to-right without seeing the answer.
Cell-free protein synthesis: A method of producing proteins outside of living cells by using extracted cellular machinery in a controlled mixture.
Claude Agent SDK: Anthropic's software development kit that enables Claude to operate as an autonomous coding agent with subagent orchestration, background tasks, and plugin support.
Constitutional AI: An AI training approach developed by Anthropic where models are trained to follow a set of principles or 'constitution' rather than relying solely on human feedback.
Context compaction: A technique where an AI model summarizes its own conversation context to extend effective session length.
Continuous Batching: A serving technique that dynamically adds and removes requests from a GPU batch as they start and finish, rather than waiting for all requests to complete.

Decode: The second phase of LLM inference where output tokens are generated one at a time, bottlenecked by memory bandwidth rather than compute.
DPO: Direct Preference Optimization. A simpler alternative to RLHF that trains directly on preference data using a single classification loss, eliminating the need for a separate reward model.

Fertility (tokenization): The average number of tokens a tokenizer produces per word in a given language, used to measure tokenization efficiency.
Flash Attention 3: The third generation of Flash Attention, an efficient attention algorithm that reduces memory usage and speeds up transformer training.
FlashAttention: A hardware-aware optimization that computes attention using memory tiling to avoid materializing the full n×n attention matrix in GPU memory, yielding 2-4x speedups.
FP8: An 8-bit floating-point number format used in AI training that theoretically doubles throughput compared to BF16, though practical gains are often smaller.
Frontier: OpenAI's enterprise platform for building, deploying, and managing AI agents with built-in governance and shared business context.
Frontier Model: The most capable and computationally expensive AI models available, representing the current state of the art in performance.

GDPval-AA: A benchmark measuring AI performance on economically valuable knowledge work tasks.
GOV.UK: The UK government's central digital platform providing public services and information to citizens.
Grouped-Query Attention: An attention architecture that reduces KV cache size by sharing key-value heads across multiple query heads, improving inference throughput.

Inference optimization: Engineering improvements to the systems that serve a trained model, making it faster or cheaper to run without changing the model itself.
Interpretability: The degree to which humans can understand and trace an AI system's reasoning and outputs.

KV Cache: A memory structure that stores previously computed key and value tensors during LLM inference, avoiding redundant computation but consuming significant GPU memory.

Long-Term Benefit Trust: Anthropic's governance body that selects board members and advises leadership on the company's public benefit mission.

Managing Director: A senior executive responsible for overall operations of a company or division within a specific region or business unit.
MAU: Monthly active users; a standard metric for measuring the number of unique users who engage with a product within a 30-day period.
Model Context Protocol: An open protocol that standardizes how AI models connect to external tools, data sources, and development environments.
Multi-agent orchestration: A system where multiple AI agents with specialized roles coordinate to complete complex tasks, rather than a single agent handling everything.
Multi-agent system: An AI architecture where multiple specialized agents coordinate to handle different aspects of a complex task.
Multi-Head Attention: Running multiple attention operations in parallel, each with its own learned weight matrices, so the model can capture different types of relationships simultaneously.
Muon optimizer: An optimizer used for weight matrices in transformer training that employs techniques like orthogonalization and factored variance reduction.

nanochat: Andrej Karpathy's open-source project for efficient GPT-2 training, designed as a minimal and forkable research codebase.

Prefill: The first phase of LLM inference where all input tokens are processed simultaneously in a compute-bound, parallelized operation.

Query, Key, Value (Q/K/V): The three vectors each token produces in attention: the query asks what's relevant, the key advertises what a token contains, and the value carries the information passed forward when selected.

Ratchet Effect: The tendency for advertising incentives, once introduced into a product, to expand over time as they become integrated into revenue targets and product development.
Reward Hacking: When a model finds degenerate shortcuts to score well on the reward function without actually being more helpful or safe.
RLHF: Reinforcement Learning from Human Feedback. A training method that optimizes language model outputs against a reward model trained on human preference rankings.

Scan, Pilot, Scale: DSIT's framework for phased government AI deployment, moving from testing to broader rollout in deliberate stages.
Self-Attention: A mechanism where every token in a sequence computes relevance scores against every other token, allowing the model to weigh which parts of the input matter most for each position.
Streisand Effect: When attempts to hide or suppress information backfire by drawing more attention to it.

Teach For All: A global nonprofit network of independent organizations that recruit and develop teachers and leaders to work in under-resourced schools and communities across 63 countries.
Time per token: The latency required to generate each individual token during model inference, independent of total output length.
Tokenization: The process of converting text into integer sequences (tokens) that a language model computes over.
Trusted Access for Cyber: OpenAI's program requiring identity verification and use-case screening to access advanced cybersecurity capabilities in GPT-5.3-Codex.

Vocabulary size: The total number of unique tokens a tokenizer can represent, typically ranging from 32K to 128K in modern LLMs.

Xcode Previews: Xcode's live visual rendering system that shows real-time previews of SwiftUI views as developers write code.