AI Glossary
Key terms and definitions in artificial intelligence.
A
- Agent teams
- A feature in Claude Code that lets developers spawn specialized sub-agents for different tasks like planning, implementation, and review.
- Agentic AI
- AI systems designed to take autonomous actions and make decisions, going beyond answering questions to actively guiding users through processes.
- AI Mode
- Google's chatbot-style search interface that generates AI-powered responses to queries.
- AI@HHMI
- Howard Hughes Medical Institute's initiative launched in 2024 to integrate AI across its biomedical research programs.
- Alignment Tax
- The measurable degradation in a model's core capabilities (reading comprehension, translation, reasoning) caused by RLHF safety training.
B
- Byte Pair Encoding
- A compression algorithm that iteratively merges the most frequent adjacent character pairs to build a subword vocabulary.
C
- Capex
- Capital expenditure; money spent by a company to acquire or upgrade physical assets like data centers and compute infrastructure.
- Causal Masking
- A constraint in decoder models (like GPT) that prevents each token from attending to future tokens, ensuring the model generates text left-to-right without seeing the answer.
- Cell-free protein synthesis
- A method of producing proteins outside of living cells by using extracted cellular machinery in a controlled mixture.
- Claude Agent SDK
- Anthropic's software development kit that enables Claude to operate as an autonomous coding agent with subagent orchestration, background tasks, and plugin support.
- Constitutional AI
- An AI training approach developed by Anthropic where models are trained to follow a set of principles or 'constitution' rather than relying solely on human feedback.
- Context compaction
- A technique where an AI model summarizes its own conversation context to extend effective session length.
- Continuous Batching
- A serving technique that dynamically adds and removes requests from a GPU batch as they start and finish, rather than waiting for all requests to complete.
D
- Decode
- The second phase of LLM inference where output tokens are generated one at a time, bottlenecked by memory bandwidth rather than compute.
- DPO
- Direct Preference Optimization. A simpler alternative to RLHF that trains directly on preference data using a single classification loss, eliminating the need for a separate reward model.
F
- Fertility (tokenization)
- The average number of tokens a tokenizer produces per word in a given language, used to measure tokenization efficiency.
- Flash Attention 3
- The third generation of Flash Attention, an efficient attention algorithm that reduces memory usage and speeds up transformer training.
- FlashAttention
- A hardware-aware optimization that computes attention using memory tiling to avoid materializing the full n×n attention matrix in GPU memory, yielding 2-4x speedups.
- FP8
- An 8-bit floating-point number format used in AI training that theoretically doubles throughput compared to BF16, though practical gains are often smaller.
- Frontier
- OpenAI's enterprise platform for building, deploying, and managing AI agents with built-in governance and shared business context.
- Frontier Model
- The most capable and computationally expensive AI models available, representing the current state of the art in performance.
G
- GDPval-AA
- A benchmark measuring AI performance on economically valuable knowledge work tasks.
- GOV.UK
- The UK government's central digital platform providing public services and information to citizens.
- Grouped-Query Attention
- An attention architecture that reduces KV cache size by sharing key-value heads across multiple query heads, improving inference throughput.
I
- Inference optimization
- Engineering improvements to the systems that serve a trained model, making it faster or cheaper to run without changing the model itself.
- Interpretability
- The degree to which humans can understand and trace an AI system's reasoning and outputs.
K
- KV Cache
- A memory structure that stores previously computed key and value tensors during LLM inference, avoiding redundant computation but consuming significant GPU memory.
L
- Long-Term Benefit Trust
- Anthropic's governance body that selects board members and advises leadership on the company's public benefit mission.
M
- Managing Director
- A senior executive responsible for overall operations of a company or division within a specific region or business unit.
- MAU
- Monthly active users; a standard metric for measuring the number of unique users who engage with a product within a 30-day period.
- Model Context Protocol
- An open protocol that standardizes how AI models connect to external tools, data sources, and development environments.
- Multi-agent orchestration
- A system where multiple AI agents with specialized roles coordinate to complete complex tasks, rather than a single agent handling everything.
- Multi-agent system
- An AI architecture where multiple specialized agents coordinate to handle different aspects of a complex task.
- Multi-Head Attention
- Running multiple attention operations in parallel, each with its own learned weight matrices, so the model can capture different types of relationships simultaneously.
- Muon optimizer
- An optimizer used for weight matrices in transformer training that employs techniques like orthogonalization and factored variance reduction.
N
- nanochat
- Andrej Karpathy's open-source project for efficient GPT-2 training, designed as a minimal and forkable research codebase.
P
- Prefill
- The first phase of LLM inference where all input tokens are processed simultaneously in a compute-bound, parallelized operation.
Q
- Query, Key, Value (Q/K/V)
- The three vectors each token produces in attention: the query asks what's relevant, the key advertises what a token contains, and the value carries the information passed forward when selected.
R
- Ratchet Effect
- The tendency for advertising incentives, once introduced into a product, to expand over time as they become integrated into revenue targets and product development.
- Reward Hacking
- When a model finds degenerate shortcuts to score well on the reward function without actually being more helpful or safe.
- RLHF
- Reinforcement Learning from Human Feedback. A training method that optimizes language model outputs against a reward model trained on human preference rankings.
S
- Scan, Pilot, Scale
- DSIT's framework for phased government AI deployment, moving from testing to broader rollout in deliberate stages.
- Self-Attention
- A mechanism where every token in a sequence computes relevance scores against every other token, allowing the model to weigh which parts of the input matter most for each position.
- Streisand Effect
- When attempts to hide or suppress information backfire by drawing more attention to it.
T
- Teach For All
- A global nonprofit network of independent organizations that recruit and develop teachers and leaders to work in under-resourced schools and communities across 63 countries.
- Time per token
- The latency required to generate each individual token during model inference, independent of total output length.
- Tokenization
- The process of converting text into integer sequences (tokens) that a language model computes over.
- Trusted Access for Cyber
- OpenAI's program requiring identity verification and use-case screening to access advanced cybersecurity capabilities in GPT-5.3-Codex.
V
- Vocabulary size
- The total number of unique tokens a tokenizer can represent, typically ranging from 32K to 128K in modern LLMs.
X
- Xcode Previews
- Xcode's live visual rendering system that shows real-time previews of SwiftUI views as developers write code.