Glossary

Key terms and definitions

A

Agent Card
A JSON manifest that describes an A2A-compatible agent's capabilities, accepted inputs, and endpoint location, enabling discovery without a central registry.
Agent Swarm
Moonshot AI's parallel multi-agent architecture that coordinates up to 100 sub-agents executing thousands of tool calls simultaneously.
Agent2Agent Protocol
Google's open standard for AI agent interoperability, enabling autonomous systems to discover each other, negotiate tasks, and collaborate while keeping proprietary logic opaque.
Agentic AI Foundation
A Linux Foundation directed fund governing open standards for AI agents, co-founded by Anthropic, OpenAI, and Block to ensure vendor-neutral infrastructure development.
AI Agent
An autonomous AI system that can perceive its environment, make decisions, and take actions to accomplish goals with minimal human intervention.
AI Benchmark
A standardized test used to measure and compare AI model performance on specific tasks, increasingly criticized for failing to predict real-world capabilities.
AI Co-Scientist
A multi-agent AI system that generates scientific hypotheses, designs experiments, and integrates with robotic labs to run physical validation autonomously.
AI Coding Agent
An autonomous AI system that can reason across entire codebases, plan multi-step programming tasks, modify files, and iterate on solutions with minimal human intervention.
AI Coding Assistant
Software tools that use large language models to help developers write, review, and debug code through suggestions and completions.
AI Content Detection
Tools and techniques that attempt to distinguish AI-generated text or media from human-created content, using statistical analysis, watermarking, or provenance tracking.
Alignment Tax
The measurable performance degradation on capabilities like reasoning, translation, and factual accuracy that occurs when models undergo RLHF alignment training.
AMI Labs
Yann LeCun's world model startup using I-JEPA architecture, seeking €500 million at a €3 billion pre-product valuation to build AI systems grounded in physical reality.
Anthropic
An AI safety company founded in 2021 that develops the Claude family of large language models, emphasizing research into AI alignment and responsible deployment.
Attribution Graph
A technique that traces the computational path from input tokens through intermediate features to model outputs, revealing the reasoning steps inside neural networks.

B

Backpropagation
The algorithm that trains neural networks by calculating how each weight contributes to prediction error and adjusting weights to reduce that error.
Byte Pair Encoding
A compression algorithm that iteratively merges frequent character pairs to build a vocabulary, used by most major LLMs to convert text into tokens.

C

C2PA
The Coalition for Content Provenance and Authenticity, an industry standard that creates cryptographic chains of custody for digital media from creation through distribution.
Cerebras Systems
An AI chip company that builds wafer-scale processors, using nearly entire silicon wafers to create single massive chips designed to accelerate AI training and inference.
Chain-of-Thought Prompting
A prompting technique that asks the model to show reasoning steps before the final answer, improving performance on arithmetic, logic, and commonsense tasks in sufficiently large models.
ChatGPT
OpenAI's consumer AI assistant product, available in free and paid tiers, that brought large language models to mainstream adoption.
Chinchilla Scaling
DeepMind's 2022 finding that optimal LLM training should scale data and parameters roughly equally, overturning earlier assumptions that bigger models were more compute-efficient.
Chunking
The process of splitting documents into smaller segments for embedding and retrieval, where chunk size and overlap determine what information retrieval can find.
Claude
Anthropic's family of large language models, designed with a focus on safety, helpfulness, and honesty, available through API and consumer products.
Claude Agent SDK
Anthropic's framework for building autonomous AI agents that can reason across complex tasks, spawn subagents, and execute multi-step workflows without continuous human intervention.
Claude Artifacts
A Claude feature that enables users to create and share interactive web applications, documents, visualizations, and other standalone outputs directly within conversations.
Claude Code
Anthropic's command-line coding agent that operates from the terminal, capable of reading codebases, executing commands, and making multi-file changes through an agentic workflow.
Claude Cowork
Anthropic's agentic AI product for knowledge workers, offering Claude Code's file system access and autonomous task execution through a non-technical interface with enterprise plugins.
Claude Opus
Anthropic's most capable model tier in the Claude family, designed for complex reasoning, extended agentic tasks, and supervising multi-agent workflows.
Claude's Constitution
Anthropic's public document outlining the principles and values that guide how Claude is trained to behave, emphasizing helpfulness, honesty, and harmlessness.
Codex CLI
OpenAI's lightweight terminal-based coding agent built in Rust, supporting both ChatGPT authentication and API keys with Model Context Protocol extensibility.
Constitutional AI
An AI training methodology developed by Anthropic that uses a set of principles to guide model self-improvement, generating synthetic preference data and critiques based on constitutional values.
Context Window
The maximum number of tokens a language model can process in a single forward pass, determining how much text the model can "see" at once.
Continuous Batching
A serving technique that immediately ejects finished sequences from a batch and slots in new requests, rather than waiting for all requests in a batch to complete.
Cosine Similarity
A metric that measures the angle between two vectors, treating direction as meaning and ignoring magnitude.
Cross-Entropy Loss
A classification loss function that measures how surprised the model is by the true label, heavily penalizing confident wrong predictions.
Cursor
An AI-powered code editor built as a VS Code fork, featuring deep AI integration for code suggestions, refactors, and multi-file editing with manual context selection.

D

Data Contamination
When a model's training data overlaps with test sets, causing inflated benchmark scores that reflect memorization rather than genuine capability.
Decode
The second phase of LLM inference where output tokens are generated one at a time, each requiring a full forward pass that is bottlenecked by memory bandwidth rather than compute.
DeepSeek R1
DeepSeek's open-weight reasoning model that matches OpenAI o1 benchmarks, trained using GRPO on a 671B Mixture of Experts architecture for approximately $6 million.
Defense-in-Depth
A security strategy that layers multiple overlapping controls, so that if one defense fails, others still limit the impact of an attack.
Demographic Parity
A fairness metric requiring that selection rates be equal across demographic groups, regardless of qualification differences.
Digital Twin
A virtual replica of a physical system, service, or process that mirrors its behavior for testing, simulation, or analysis without affecting the original.
Direct Preference Optimization
An alternative to RLHF that trains language models on preference data using a single supervised loss, eliminating the need for a separate reward model.
Dual-Use AI
AI capabilities that can serve both beneficial and harmful purposes, requiring careful access controls and governance decisions.

E

Embedding
A numerical vector representation of data (text, images, etc.) in high-dimensional space, where semantic similarity corresponds to geometric proximity.
Embedding Drift
The gradual divergence between document and query embeddings over time, causing retrieval quality to silently degrade.
Emergence
Capabilities that appear at scale but were never explicitly trained for, such as arithmetic or multi-step reasoning emerging from text prediction.
Equal Opportunity
A relaxed fairness metric requiring only equal true positive rates across demographic groups.
Equalized Odds
A fairness metric requiring equal false positive and false negative rates across demographic groups.

F

Fault-Tolerant Quantum Computing
Quantum computing with error correction sufficient to run arbitrary-length algorithms without accumulated errors destroying the computation.
Fertility
A metric measuring how many tokens a tokenizer produces per word or semantic unit, with higher fertility indicating less efficient encoding and higher costs.
Few-Shot Prompting
A technique that includes example input-output pairs in the prompt to demonstrate the desired format and style, where format consistency matters more than label accuracy.
Fine-tuning
A training technique that adapts a pre-trained model to specific tasks using smaller, task-specific datasets at dramatically lower cost than training from scratch.
Flash Attention
A memory-efficient attention algorithm that reduces GPU memory usage and speeds up transformer training and inference by avoiding materialization of the full attention matrix.
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks, serving as the base for specialized applications.
Frontier Model
The most capable and computationally expensive AI models at any given time, typically requiring billions of dollars in training compute and representing the cutting edge of AI capabilities.

G

Gemini
Google's family of multimodal AI models and consumer assistant products, competing directly with ChatGPT and Claude in the AI assistant market.
Goodhart's Law
The principle that when a measure becomes a target, it ceases to be a good measure, widely observed in AI benchmark gaming.
Google DeepMind
Google's primary AI research laboratory, formed from the 2023 merger of DeepMind and Google Brain, responsible for developing Gemini and other frontier AI systems.
GPT-2
OpenAI's 2019 language model that demonstrated emergent text generation capabilities, initially withheld from release over misuse concerns and now used as a standard training benchmark.
GPT-5
OpenAI's flagship frontier model released in 2025, capable of complex reasoning, agentic tasks, and multimodal understanding.
Gradient Descent
An optimization algorithm that iteratively adjusts model parameters by moving in the direction that most reduces the error function.
Grouped-Query Attention
An attention mechanism variant that shares key-value heads across multiple query heads, reducing KV cache memory requirements while maintaining model quality.
GRPO
Group Relative Policy Optimization, DeepSeek's RL training method that eliminates the critic model by computing advantages relative to group averages of sampled responses.

H

Hallucination
When an AI model generates confident, plausible-sounding information that is factually incorrect, fabricated, or not grounded in its training data or provided context.
Homogenization
The concentration of AI applications on a few foundation models, creating powerful leverage but also correlated failure modes across the ecosystem.
Hybrid Search
A retrieval technique that combines keyword matching (BM25) with semantic embeddings, using Reciprocal Rank Fusion to merge results and catch matches that pure vector search misses.

I

I-JEPA
Joint Embedding Predictive Architecture, a self-supervised learning method that predicts abstract representations of masked image regions rather than reconstructing raw pixels.
Inference-Time Compute
A scaling approach that allocates additional compute during model inference rather than training, allowing models to 'think longer' on difficult problems.
Interpretability
The degree to which humans can understand and trace an AI system's reasoning process and the factors that led to its outputs.
Intruder Dimensions
Novel high-ranking singular vectors that emerge during LoRA training but are absent in full fine-tuning, potentially interfering with pre-trained model capabilities.

K

Knowledge Distillation
A training technique where a smaller student model learns to replicate the outputs of a larger teacher model, transferring capabilities at reduced computational cost.
KV Cache
A memory structure that stores computed key and value tensors from previous tokens during LLM inference, eliminating redundant computation in autoregressive generation.

L

Lab-in-the-Loop
An AI architecture pattern where robotic lab systems run physical experiments and feed results directly back into the AI's decision-making process.
LLM Inference
The process of generating outputs from a trained large language model, consisting of prefill (processing the input) and decode (generating tokens one at a time) phases.
LLM-as-Judge
An evaluation technique where one language model scores the outputs of another, achieving human-level agreement on many tasks when properly calibrated.
Long-Term Benefit Trust
Anthropic's governance structure that selects board members and advises leadership, designed to ensure the company prioritizes its public benefit mission over purely commercial interests.
LoRA
Low-Rank Adaptation, a parameter-efficient fine-tuning method that updates only a small fraction of model weights by learning low-rank decomposition matrices.
Loss Function
A mathematical function that quantifies how wrong a model's predictions are, providing the signal that drives learning.

M

Mean Squared Error
A loss function that measures prediction error by squaring the difference between predicted and actual values, heavily penalizing large errors.
Mechanistic Interpretability
A research approach that reverse-engineers neural networks by identifying interpretable features and tracing how they combine to produce outputs.
Microsoft Copilot
Microsoft's family of AI assistants integrated across its product ecosystem, including GitHub Copilot for coding and Microsoft 365 Copilot for productivity applications like Teams, Outlook, and Word.
Mixture of Experts
A neural network architecture that routes inputs to specialized subnetworks (experts), activating only a fraction of total parameters per inference to reduce compute costs.
Model Card
A standardized documentation format for ML models covering intended use, performance metrics by demographic group, and limitations.
Model Context Protocol
An open standard for connecting AI models to external tools, data sources, and application contexts, enabling agents to interact with IDEs, databases, and other systems through a unified interface.
Moonshot AI
A Chinese AI company building large language models, known for the Kimi family and its Agent Swarm parallel coordination architecture.
Multi-Agent System
An architecture where multiple specialized AI agents coordinate to accomplish complex tasks, with each agent handling distinct responsibilities.
Multi-Head Attention
A technique that runs multiple self-attention operations in parallel, each learning different types of relationships between tokens.
Multi-Query Attention
An attention mechanism variant where all query heads share a single key-value pair, providing maximum KV cache reduction at the cost of some model quality.

N

NeRF
Neural Radiance Fields, a technique for synthesizing novel 3D views from 2D images by training neural networks to represent scenes as continuous volumetric functions.
Neural Network
A computing system inspired by biological neurons that learns patterns from data by adjusting connection weights through iterative training.
NISQ
Noisy Intermediate-Scale Quantum computing, the current era of quantum hardware characterized by limited qubits and high error rates.

O

Open-Weight Model
An AI model whose trained parameters are publicly released, allowing anyone to download, run, and modify the model without API access restrictions.
OpenAI
An AI research company founded in 2015 that develops the GPT series of large language models and the ChatGPT consumer product, pioneering many foundational techniques in modern AI.
OpenAI Frontier
OpenAI's enterprise platform for building, deploying, and governing AI agents at scale, providing shared business context, permissions management, and integration with enterprise systems.

P

PagedAttention
A memory management technique for LLM serving that handles KV cache allocation like an operating system manages virtual memory, eliminating fragmentation and improving GPU utilization.
PARL
Parallel-Agent Reinforcement Learning, Moonshot's training method that teaches orchestrator models to decompose problems and dispatch sub-agents in parallel.
PEFT
Parameter-Efficient Fine-Tuning, a family of techniques that adapt large models by updating only a small subset of parameters.
Physical AI
AI systems that perceive the physical world and act on it in real time, encompassing robots, autonomous vehicles, and other embodied agents.
Physical Intelligence
A robotics AI company building vision-language-action foundation models, valued at $5.6 billion after raising $600 million in Series B funding from investors including Jeff Bezos, OpenAI, and Sequoia.
Polysemanticity
The phenomenon where individual neurons in neural networks activate for multiple unrelated concepts, making single neurons uninterpretable.
Positional Encoding
A technique that injects sequence position information into transformer inputs, compensating for the architecture's lack of inherent ordering.
Pre-training
The initial compute-intensive training phase where a model learns general patterns from massive unlabeled datasets using self-supervision.
Predictive Parity
A fairness metric requiring that positive predictions have equal accuracy across demographic groups.
Prefill
The first phase of LLM inference where all input tokens are processed simultaneously through matrix-matrix multiplication, saturating GPU compute capacity.
Prompt Injection
An attack technique where malicious instructions are inserted into content processed by an LLM, exploiting the model's inability to distinguish trusted commands from untrusted data.
Pydantic AI
A Python framework for building AI agents, developed by the team behind the Pydantic validation library, featuring Code Mode for LLM-generated script execution.

Q

QLoRA
A memory-efficient fine-tuning method that combines LoRA adapters with 4-bit quantization of base model weights, trading longer training time for roughly 33% memory savings.
Quantization
A compression technique that reduces model precision from 32-bit floats to 8-bit or 4-bit integers, shrinking memory requirements and accelerating inference.
Quantum Advantage
A milestone where quantum computers outperform classical computers on a specific task, though definitions vary on whether the task must be practically useful.
Qubit
The fundamental unit of quantum information, analogous to a classical bit but capable of existing in superposition of 0 and 1 states.

R

RAG
Retrieval-Augmented Generation: a pattern that retrieves relevant documents via embeddings and feeds them to an LLM as context for generation.
Regularization
Additional loss terms that penalize model complexity, typically by adding weight magnitude penalties to prevent overfitting.
Reranking
A two-stage retrieval pattern where fast initial search retrieves candidates, then more expensive models like cross-encoders re-score them by actual query relevance.
Reward Model
A neural network trained on human preference comparisons to score language model outputs, providing the optimization signal for RLHF.
RLHF
A training technique that fine-tunes language models using human preference data, optimizing outputs through reinforcement learning against a reward model trained on human comparisons.

S

Sam Altman
CEO of OpenAI and one of the most prominent figures in commercial AI development, known for leading the company through ChatGPT's launch and rapid growth.
Scaling Laws
Empirical power-law relationships that predict how model performance improves as compute, data, and parameters increase, showing logarithmic capability gains for exponential resource investments.
Self-Attention
A mechanism where each token in a sequence computes attention weights over all other tokens, enabling the model to capture dependencies regardless of distance.
Self-Play Debate
A multi-agent technique where AI agents argue for and against hypotheses competitively, surfacing weaknesses through adversarial reasoning.
Small Language Model
A language model with 1-4 billion parameters designed for efficient deployment on edge devices, offering lower costs and faster inference than frontier LLMs.
Sparse Autoencoder
A neural network trained to decompose model activations into millions of interpretable features, each corresponding to a human-readable concept.
Special Purpose Vehicle
A separate legal entity created by investors to make a specific investment outside their main fund structure, often used for large or unusual deals.
State Space Model
A class of sequence models that process tokens in linear time O(n) by maintaining a fixed-size hidden state, offering an efficient alternative to quadratic attention.
Structured Output
A capability that constrains model responses to valid JSON matching a specified schema, guaranteeing format compliance but not factual accuracy.
Supervised Fine-tuning
A training stage that transforms a pre-trained language model into a conversational assistant by training on curated examples of ideal responses.
SynthID
Google DeepMind's watermarking technology that embeds invisible statistical signatures into AI-generated content during the generation process.

T

Time to First Token
A latency metric measuring how long a user waits from sending a prompt until the first response token appears, driven by the compute-bound prefill phase.
Tokenization
The process of converting text into integer sequences that language models actually compute over, determining API costs and encoding efficiency across languages.
Tool Calling
An agentic pattern where LLMs invoke external functions or APIs sequentially, receiving results back through the context window to inform subsequent reasoning steps.
Transformer
A neural network architecture based on self-attention that processes all tokens in parallel, replacing sequential RNNs and enabling the current generation of large language models.

V

Vision-Language-Action Model
A neural network architecture that unifies visual perception, language understanding, and motor control into a single model, enabling robots to process camera feeds and voice commands to generate physical movements.

W

Wafer Scale Engine
Cerebras' massive AI processor that uses nearly an entire silicon wafer as a single chip, measuring 8.5 inches per side with trillions of transistors.
Windsurf
An AI code editor with automatic context indexing that writes code to disk before approval, enabling real-time preview of AI-generated changes.
World Labs
Fei-Fei Li's AI company building spatial intelligence systems, first to ship a commercial world model product with their Marble platform.
World Model
An AI system that maintains internal representations of physical reality, predicting how world states evolve through space and time rather than generating text tokens.

X

xAI
Elon Musk's AI company founded in 2023, developing the Grok family of large language models and now merged with SpaceX.

Z

Zhipu AI
A Chinese AI company developing the GLM family of large language models, notable for training frontier models entirely on domestic Huawei Ascend chips.