AI Glossary

Key terms and definitions in artificial intelligence.

A

Agent teams
A feature in Claude Code that lets developers spawn specialized sub-agents for different tasks like planning, implementation, and review.
Agentic AI
AI systems designed to take autonomous actions and make decisions, going beyond answering questions to actively guiding users through processes.
AI Mode
Google's chatbot-style search interface that generates AI-powered responses to queries.
AI@HHMI
Howard Hughes Medical Institute's initiative launched in 2024 to integrate AI across its biomedical research programs.
Alignment Tax
The measurable degradation in a model's core capabilities (reading comprehension, translation, reasoning) caused by RLHF safety training.

B

Byte Pair Encoding
A compression algorithm that iteratively merges the most frequent adjacent character pairs to build a subword vocabulary.

C

Capex
Capital expenditure; money spent by a company to acquire or upgrade physical assets like data centers and compute infrastructure.
Causal Masking
A constraint in decoder models (like GPT) that prevents each token from attending to future tokens, ensuring the model generates text left-to-right without seeing the answer.
Cell-free protein synthesis
A method of producing proteins outside of living cells by using extracted cellular machinery in a controlled mixture.
Claude Agent SDK
Anthropic's software development kit that enables Claude to operate as an autonomous coding agent with subagent orchestration, background tasks, and plugin support.
Constitutional AI
An AI training approach developed by Anthropic where models are trained to follow a set of principles or 'constitution' rather than relying solely on human feedback.
Context compaction
A technique where an AI model summarizes its own conversation context to extend effective session length.
Continuous Batching
A serving technique that dynamically adds and removes requests from a GPU batch as they start and finish, rather than waiting for all requests to complete.

D

Decode
The second phase of LLM inference where output tokens are generated one at a time, bottlenecked by memory bandwidth rather than compute.
DPO
Direct Preference Optimization. A simpler alternative to RLHF that trains directly on preference data using a single classification loss, eliminating the need for a separate reward model.

F

Fertility (tokenization)
The average number of tokens a tokenizer produces per word in a given language, used to measure tokenization efficiency.
Flash Attention 3
The third generation of Flash Attention, an efficient attention algorithm that reduces memory usage and speeds up transformer training.
FlashAttention
A hardware-aware optimization that computes attention using memory tiling to avoid materializing the full n×n attention matrix in GPU memory, yielding 2-4x speedups.
FP8
An 8-bit floating-point number format used in AI training that theoretically doubles throughput compared to BF16, though practical gains are often smaller.
Frontier
OpenAI's enterprise platform for building, deploying, and managing AI agents with built-in governance and shared business context.
Frontier Model
The most capable and computationally expensive AI models available, representing the current state of the art in performance.

G

GDPval-AA
A benchmark measuring AI performance on economically valuable knowledge work tasks.
GOV.UK
The UK government's central digital platform providing public services and information to citizens.
Grouped-Query Attention
An attention architecture that reduces KV cache size by sharing key-value heads across multiple query heads, improving inference throughput.

I

Inference optimization
Engineering improvements to the systems that serve a trained model, making it faster or cheaper to run without changing the model itself.
Interpretability
The degree to which humans can understand and trace an AI system's reasoning and outputs.

K

KV Cache
A memory structure that stores previously computed key and value tensors during LLM inference, avoiding redundant computation but consuming significant GPU memory.

L

Long-Term Benefit Trust
Anthropic's governance body that selects board members and advises leadership on the company's public benefit mission.

M

Managing Director
A senior executive responsible for overall operations of a company or division within a specific region or business unit.
MAU
Monthly active users; a standard metric for measuring the number of unique users who engage with a product within a 30-day period.
Model Context Protocol
An open protocol that standardizes how AI models connect to external tools, data sources, and development environments.
Multi-agent orchestration
A system where multiple AI agents with specialized roles coordinate to complete complex tasks, rather than a single agent handling everything.
Multi-agent system
An AI architecture where multiple specialized agents coordinate to handle different aspects of a complex task.
Multi-Head Attention
Running multiple attention operations in parallel, each with its own learned weight matrices, so the model can capture different types of relationships simultaneously.
Muon optimizer
An optimizer used for weight matrices in transformer training that employs techniques like orthogonalization and factored variance reduction.

N

nanochat
Andrej Karpathy's open-source project for efficient GPT-2 training, designed as a minimal and forkable research codebase.

P

Prefill
The first phase of LLM inference where all input tokens are processed simultaneously in a compute-bound, parallelized operation.

Q

Query, Key, Value (Q/K/V)
The three vectors each token produces in attention: the query asks what's relevant, the key advertises what a token contains, and the value carries the information passed forward when selected.

R

Ratchet Effect
The tendency for advertising incentives, once introduced into a product, to expand over time as they become integrated into revenue targets and product development.
Reward Hacking
When a model finds degenerate shortcuts to score well on the reward function without actually being more helpful or safe.
RLHF
Reinforcement Learning from Human Feedback. A training method that optimizes language model outputs against a reward model trained on human preference rankings.

S

Scan, Pilot, Scale
DSIT's framework for phased government AI deployment, moving from testing to broader rollout in deliberate stages.
Self-Attention
A mechanism where every token in a sequence computes relevance scores against every other token, allowing the model to weigh which parts of the input matter most for each position.
Streisand Effect
When attempts to hide or suppress information backfire by drawing more attention to it.

T

Teach For All
A global nonprofit network of independent organizations that recruit and develop teachers and leaders to work in under-resourced schools and communities across 63 countries.
Time per token
The latency required to generate each individual token during model inference, independent of total output length.
Tokenization
The process of converting text into integer sequences (tokens) that a language model computes over.
Trusted Access for Cyber
OpenAI's program requiring identity verification and use-case screening to access advanced cybersecurity capabilities in GPT-5.3-Codex.

V

Vocabulary size
The total number of unique tokens a tokenizer can represent, typically ranging from 32K to 128K in modern LLMs.

X

Xcode Previews
Xcode's live visual rendering system that shows real-time previews of SwiftUI views as developers write code.