Glossary
Key terms and definitions
A
- Agent Card
- A JSON manifest that describes an A2A-compatible agent's capabilities, accepted inputs, and endpoint location, enabling discovery without a central registry.
- Agent Swarm
- Moonshot AI's parallel multi-agent architecture that coordinates up to 100 sub-agents executing thousands of tool calls simultaneously.
- Agent2Agent Protocol
- Google's open standard for AI agent interoperability, enabling autonomous systems to discover each other, negotiate tasks, and collaborate while keeping proprietary logic opaque.
- Agentic AI Foundation
- A Linux Foundation directed fund governing open standards for AI agents, co-founded by Anthropic, OpenAI, and Block to ensure vendor-neutral infrastructure development.
- AI Agent
- An autonomous AI system that can perceive its environment, make decisions, and take actions to accomplish goals with minimal human intervention.
- AI Benchmark
- A standardized test used to measure and compare AI model performance on specific tasks, increasingly criticized for failing to predict real-world capabilities.
- AI Co-Scientist
- A multi-agent AI system that generates scientific hypotheses, designs experiments, and integrates with robotic labs to run physical validation autonomously.
- AI Coding Agent
- An autonomous AI system that can reason across entire codebases, plan multi-step programming tasks, modify files, and iterate on solutions with minimal human intervention.
- AI Coding Assistant
- Software tools that use large language models to help developers write, review, and debug code through suggestions and completions.
- AI Content Detection
- Tools and techniques that attempt to distinguish AI-generated text or media from human-created content, using statistical analysis, watermarking, or provenance tracking.
- Alignment Tax
- The measurable performance degradation on capabilities like reasoning, translation, and factual accuracy that occurs when models undergo RLHF alignment training.
- AMI Labs
- Yann LeCun's world model startup using I-JEPA architecture, seeking €500 million at a €3 billion pre-product valuation to build AI systems grounded in physical reality.
- Anthropic
- An AI safety company founded in 2021 that develops the Claude family of large language models, emphasizing research into AI alignment and responsible deployment.
- Attribution Graph
- A technique that traces the computational path from input tokens through intermediate features to model outputs, revealing the reasoning steps inside neural networks.
B
- Backpropagation
- The algorithm that trains neural networks by calculating how each weight contributes to prediction error and adjusting weights to reduce that error.
- Byte Pair Encoding
- A compression algorithm that iteratively merges frequent character pairs to build a vocabulary, used by most major LLMs to convert text into tokens.
C
- C2PA
- The Coalition for Content Provenance and Authenticity, an industry standard that creates cryptographic chains of custody for digital media from creation through distribution.
- Cerebras Systems
- An AI chip company that builds wafer-scale processors, using nearly entire silicon wafers to create single massive chips designed to accelerate AI training and inference.
- Chain-of-Thought Prompting
- A prompting technique that asks the model to show reasoning steps before the final answer, improving performance on arithmetic, logic, and commonsense tasks in sufficiently large models.
- ChatGPT
- OpenAI's consumer AI assistant product, available in free and paid tiers, that brought large language models to mainstream adoption.
- Chinchilla Scaling
- DeepMind's 2022 finding that optimal LLM training should scale data and parameters roughly equally, overturning earlier assumptions that bigger models were more compute-efficient.
- Chunking
- The process of splitting documents into smaller segments for embedding and retrieval, where chunk size and overlap determine what information retrieval can find.
- Claude
- Anthropic's family of large language models, designed with a focus on safety, helpfulness, and honesty, available through API and consumer products.
- Claude Agent SDK
- Anthropic's framework for building autonomous AI agents that can reason across complex tasks, spawn subagents, and execute multi-step workflows without continuous human intervention.
- Claude Artifacts
- A Claude feature that enables users to create and share interactive web applications, documents, visualizations, and other standalone outputs directly within conversations.
- Claude Code
- Anthropic's command-line coding agent that operates from the terminal, capable of reading codebases, executing commands, and making multi-file changes through an agentic workflow.
- Claude Cowork
- Anthropic's agentic AI product for knowledge workers, offering Claude Code's file system access and autonomous task execution through a non-technical interface with enterprise plugins.
- Claude Opus
- Anthropic's most capable model tier in the Claude family, designed for complex reasoning, extended agentic tasks, and supervising multi-agent workflows.
- Claude's Constitution
- Anthropic's public document outlining the principles and values that guide how Claude is trained to behave, emphasizing helpfulness, honesty, and harmlessness.
- Codex CLI
- OpenAI's lightweight terminal-based coding agent built in Rust, supporting both ChatGPT authentication and API keys with Model Context Protocol extensibility.
- Constitutional AI
- An AI training methodology developed by Anthropic that uses a set of principles to guide model self-improvement, generating synthetic preference data and critiques based on constitutional values.
- Context Window
- The maximum number of tokens a language model can process in a single forward pass, determining how much text the model can "see" at once.
- Continuous Batching
- A serving technique that immediately ejects finished sequences from a batch and slots in new requests, rather than waiting for all requests in a batch to complete.
- Cosine Similarity
- A metric that measures the angle between two vectors, treating direction as meaning and ignoring magnitude.
- Cross-Entropy Loss
- A classification loss function that measures how surprised the model is by the true label, heavily penalizing confident wrong predictions.
- Cursor
- An AI-powered code editor built as a VS Code fork, featuring deep AI integration for code suggestions, refactors, and multi-file editing with manual context selection.
D
- Data Contamination
- When a model's training data overlaps with test sets, causing inflated benchmark scores that reflect memorization rather than genuine capability.
- Decode
- The second phase of LLM inference where output tokens are generated one at a time, each requiring a full forward pass that is bottlenecked by memory bandwidth rather than compute.
- DeepSeek R1
- DeepSeek's open-weight reasoning model that matches OpenAI o1 benchmarks, trained using GRPO on a 671B Mixture of Experts architecture for approximately $6 million.
- Defense-in-Depth
- A security strategy that layers multiple overlapping controls, so that if one defense fails, others still limit the impact of an attack.
- Demographic Parity
- A fairness metric requiring that selection rates be equal across demographic groups, regardless of qualification differences.
- Digital Twin
- A virtual replica of a physical system, service, or process that mirrors its behavior for testing, simulation, or analysis without affecting the original.
- Direct Preference Optimization
- An alternative to RLHF that trains language models on preference data using a single supervised loss, eliminating the need for a separate reward model.
- Dual-Use AI
- AI capabilities that can serve both beneficial and harmful purposes, requiring careful access controls and governance decisions.
E
- Embedding
- A numerical vector representation of data (text, images, etc.) in high-dimensional space, where semantic similarity corresponds to geometric proximity.
- Embedding Drift
- The gradual divergence between document and query embeddings over time, causing retrieval quality to silently degrade.
- Emergence
- Capabilities that appear at scale but were never explicitly trained for, such as arithmetic or multi-step reasoning emerging from text prediction.
- Equal Opportunity
- A relaxed fairness metric requiring only equal true positive rates across demographic groups.
- Equalized Odds
- A fairness metric requiring equal false positive and false negative rates across demographic groups.
F
- Fault-Tolerant Quantum Computing
- Quantum computing with error correction sufficient to run arbitrary-length algorithms without accumulated errors destroying the computation.
- Fertility
- A metric measuring how many tokens a tokenizer produces per word or semantic unit, with higher fertility indicating less efficient encoding and higher costs.
- Few-Shot Prompting
- A technique that includes example input-output pairs in the prompt to demonstrate the desired format and style, where format consistency matters more than label accuracy.
- Fine-tuning
- A training technique that adapts a pre-trained model to specific tasks using smaller, task-specific datasets at dramatically lower cost than training from scratch.
- Flash Attention
- A memory-efficient attention algorithm that reduces GPU memory usage and speeds up transformer training and inference by avoiding materialization of the full attention matrix.
- Foundation Model
- A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks, serving as the base for specialized applications.
- Frontier Model
- The most capable and computationally expensive AI models at any given time, typically requiring billions of dollars in training compute and representing the cutting edge of AI capabilities.
G
- Gemini
- Google's family of multimodal AI models and consumer assistant products, competing directly with ChatGPT and Claude in the AI assistant market.
- Goodhart's Law
- The principle that when a measure becomes a target, it ceases to be a good measure, widely observed in AI benchmark gaming.
- Google DeepMind
- Google's primary AI research laboratory, formed from the 2023 merger of DeepMind and Google Brain, responsible for developing Gemini and other frontier AI systems.
- GPT-2
- OpenAI's 2019 language model that demonstrated emergent text generation capabilities, initially withheld from release over misuse concerns and now used as a standard training benchmark.
- GPT-5
- OpenAI's flagship frontier model released in 2025, capable of complex reasoning, agentic tasks, and multimodal understanding.
- Gradient Descent
- An optimization algorithm that iteratively adjusts model parameters by moving in the direction that most reduces the error function.
- Grouped-Query Attention
- An attention mechanism variant that shares key-value heads across multiple query heads, reducing KV cache memory requirements while maintaining model quality.
- GRPO
- Group Relative Policy Optimization, DeepSeek's RL training method that eliminates the critic model by computing advantages relative to group averages of sampled responses.
H
- Hallucination
- When an AI model generates confident, plausible-sounding information that is factually incorrect, fabricated, or not grounded in its training data or provided context.
- Homogenization
- The concentration of AI applications on a few foundation models, creating powerful leverage but also correlated failure modes across the ecosystem.
- Hybrid Search
- A retrieval technique that combines keyword matching (BM25) with semantic embeddings, using Reciprocal Rank Fusion to merge results and catch matches that pure vector search misses.
I
- I-JEPA
- Joint Embedding Predictive Architecture, a self-supervised learning method that predicts abstract representations of masked image regions rather than reconstructing raw pixels.
- Inference-Time Compute
- A scaling approach that allocates additional compute during model inference rather than training, allowing models to 'think longer' on difficult problems.
- Interpretability
- The degree to which humans can understand and trace an AI system's reasoning process and the factors that led to its outputs.
- Intruder Dimensions
- Novel high-ranking singular vectors that emerge during LoRA training but are absent in full fine-tuning, potentially interfering with pre-trained model capabilities.
K
- Knowledge Distillation
- A training technique where a smaller student model learns to replicate the outputs of a larger teacher model, transferring capabilities at reduced computational cost.
- KV Cache
- A memory structure that stores computed key and value tensors from previous tokens during LLM inference, eliminating redundant computation in autoregressive generation.
L
- Lab-in-the-Loop
- An AI architecture pattern where robotic lab systems run physical experiments and feed results directly back into the AI's decision-making process.
- LLM Inference
- The process of generating outputs from a trained large language model, consisting of prefill (processing the input) and decode (generating tokens one at a time) phases.
- LLM-as-Judge
- An evaluation technique where one language model scores the outputs of another, achieving human-level agreement on many tasks when properly calibrated.
- Long-Term Benefit Trust
- Anthropic's governance structure that selects board members and advises leadership, designed to ensure the company prioritizes its public benefit mission over purely commercial interests.
- LoRA
- Low-Rank Adaptation, a parameter-efficient fine-tuning method that updates only a small fraction of model weights by learning low-rank decomposition matrices.
- Loss Function
- A mathematical function that quantifies how wrong a model's predictions are, providing the signal that drives learning.
M
- Mean Squared Error
- A loss function that measures prediction error by squaring the difference between predicted and actual values, heavily penalizing large errors.
- Mechanistic Interpretability
- A research approach that reverse-engineers neural networks by identifying interpretable features and tracing how they combine to produce outputs.
- Microsoft Copilot
- Microsoft's family of AI assistants integrated across its product ecosystem, including GitHub Copilot for coding and Microsoft 365 Copilot for productivity applications like Teams, Outlook, and Word.
- Mixture of Experts
- A neural network architecture that routes inputs to specialized subnetworks (experts), activating only a fraction of total parameters per inference to reduce compute costs.
- Model Card
- A standardized documentation format for ML models covering intended use, performance metrics by demographic group, and limitations.
- Model Context Protocol
- An open standard for connecting AI models to external tools, data sources, and application contexts, enabling agents to interact with IDEs, databases, and other systems through a unified interface.
- Moonshot AI
- A Chinese AI company building large language models, known for the Kimi family and its Agent Swarm parallel coordination architecture.
- Multi-Agent System
- An architecture where multiple specialized AI agents coordinate to accomplish complex tasks, with each agent handling distinct responsibilities.
- Multi-Head Attention
- A technique that runs multiple self-attention operations in parallel, each learning different types of relationships between tokens.
- Multi-Query Attention
- An attention mechanism variant where all query heads share a single key-value pair, providing maximum KV cache reduction at the cost of some model quality.
N
- NeRF
- Neural Radiance Fields, a technique for synthesizing novel 3D views from 2D images by training neural networks to represent scenes as continuous volumetric functions.
- Neural Network
- A computing system inspired by biological neurons that learns patterns from data by adjusting connection weights through iterative training.
- NISQ
- Noisy Intermediate-Scale Quantum computing, the current era of quantum hardware characterized by limited qubits and high error rates.
O
- Open-Weight Model
- An AI model whose trained parameters are publicly released, allowing anyone to download, run, and modify the model without API access restrictions.
- OpenAI
- An AI research company founded in 2015 that develops the GPT series of large language models and the ChatGPT consumer product, pioneering many foundational techniques in modern AI.
- OpenAI Frontier
- OpenAI's enterprise platform for building, deploying, and governing AI agents at scale, providing shared business context, permissions management, and integration with enterprise systems.
P
- PagedAttention
- A memory management technique for LLM serving that handles KV cache allocation like an operating system manages virtual memory, eliminating fragmentation and improving GPU utilization.
- PARL
- Parallel-Agent Reinforcement Learning, Moonshot's training method that teaches orchestrator models to decompose problems and dispatch sub-agents in parallel.
- PEFT
- Parameter-Efficient Fine-Tuning, a family of techniques that adapt large models by updating only a small subset of parameters.
- Physical AI
- AI systems that perceive the physical world and act on it in real time, encompassing robots, autonomous vehicles, and other embodied agents.
- Physical Intelligence
- A robotics AI company building vision-language-action foundation models, valued at $5.6 billion after raising $600 million in Series B funding from investors including Jeff Bezos, OpenAI, and Sequoia.
- Polysemanticity
- The phenomenon where individual neurons in neural networks activate for multiple unrelated concepts, making single neurons uninterpretable.
- Positional Encoding
- A technique that injects sequence position information into transformer inputs, compensating for the architecture's lack of inherent ordering.
- Pre-training
- The initial compute-intensive training phase where a model learns general patterns from massive unlabeled datasets using self-supervision.
- Predictive Parity
- A fairness metric requiring that positive predictions have equal accuracy across demographic groups.
- Prefill
- The first phase of LLM inference where all input tokens are processed simultaneously through matrix-matrix multiplication, saturating GPU compute capacity.
- Prompt Injection
- An attack technique where malicious instructions are inserted into content processed by an LLM, exploiting the model's inability to distinguish trusted commands from untrusted data.
- Pydantic AI
- A Python framework for building AI agents, developed by the team behind the Pydantic validation library, featuring Code Mode for LLM-generated script execution.
Q
- QLoRA
- A memory-efficient fine-tuning method that combines LoRA adapters with 4-bit quantization of base model weights, trading longer training time for roughly 33% memory savings.
- Quantization
- A compression technique that reduces model precision from 32-bit floats to 8-bit or 4-bit integers, shrinking memory requirements and accelerating inference.
- Quantum Advantage
- A milestone where quantum computers outperform classical computers on a specific task, though definitions vary on whether the task must be practically useful.
- Qubit
- The fundamental unit of quantum information, analogous to a classical bit but capable of existing in superposition of 0 and 1 states.
R
- RAG
- Retrieval-Augmented Generation: a pattern that retrieves relevant documents via embeddings and feeds them to an LLM as context for generation.
- Regularization
- Additional loss terms that penalize model complexity, typically by adding weight magnitude penalties to prevent overfitting.
- Reranking
- A two-stage retrieval pattern where fast initial search retrieves candidates, then more expensive models like cross-encoders re-score them by actual query relevance.
- Reward Model
- A neural network trained on human preference comparisons to score language model outputs, providing the optimization signal for RLHF.
- RLHF
- A training technique that fine-tunes language models using human preference data, optimizing outputs through reinforcement learning against a reward model trained on human comparisons.
S
- Sam Altman
- CEO of OpenAI and one of the most prominent figures in commercial AI development, known for leading the company through ChatGPT's launch and rapid growth.
- Scaling Laws
- Empirical power-law relationships that predict how model performance improves as compute, data, and parameters increase, showing logarithmic capability gains for exponential resource investments.
- Self-Attention
- A mechanism where each token in a sequence computes attention weights over all other tokens, enabling the model to capture dependencies regardless of distance.
- Self-Play Debate
- A multi-agent technique where AI agents argue for and against hypotheses competitively, surfacing weaknesses through adversarial reasoning.
- Small Language Model
- A language model with 1-4 billion parameters designed for efficient deployment on edge devices, offering lower costs and faster inference than frontier LLMs.
- Sparse Autoencoder
- A neural network trained to decompose model activations into millions of interpretable features, each corresponding to a human-readable concept.
- Special Purpose Vehicle
- A separate legal entity created by investors to make a specific investment outside their main fund structure, often used for large or unusual deals.
- State Space Model
- A class of sequence models that process tokens in linear time O(n) by maintaining a fixed-size hidden state, offering an efficient alternative to quadratic attention.
- Structured Output
- A capability that constrains model responses to valid JSON matching a specified schema, guaranteeing format compliance but not factual accuracy.
- Supervised Fine-tuning
- A training stage that transforms a pre-trained language model into a conversational assistant by training on curated examples of ideal responses.
- SynthID
- Google DeepMind's watermarking technology that embeds invisible statistical signatures into AI-generated content during the generation process.
T
- Time to First Token
- A latency metric measuring how long a user waits from sending a prompt until the first response token appears, driven by the compute-bound prefill phase.
- Tokenization
- The process of converting text into integer sequences that language models actually compute over, determining API costs and encoding efficiency across languages.
- Tool Calling
- An agentic pattern where LLMs invoke external functions or APIs sequentially, receiving results back through the context window to inform subsequent reasoning steps.
- Transformer
- A neural network architecture based on self-attention that processes all tokens in parallel, replacing sequential RNNs and enabling the current generation of large language models.
V
- Vision-Language-Action Model
- A neural network architecture that unifies visual perception, language understanding, and motor control into a single model, enabling robots to process camera feeds and voice commands to generate physical movements.
W
- Wafer Scale Engine
- Cerebras' massive AI processor that uses nearly an entire silicon wafer as a single chip, measuring 8.5 inches per side with trillions of transistors.
- Windsurf
- An AI code editor with automatic context indexing that writes code to disk before approval, enabling real-time preview of AI-generated changes.
- World Labs
- Fei-Fei Li's AI company building spatial intelligence systems, first to ship a commercial world model product with their Marble platform.
- World Model
- An AI system that maintains internal representations of physical reality, predicting how world states evolve through space and time rather than generating text tokens.
X
- xAI
- Elon Musk's AI company founded in 2023, developing the Grok family of large language models and now merged with SpaceX.
Z
- Zhipu AI
- A Chinese AI company developing the GLM family of large language models, notable for training frontier models entirely on domestic Huawei Ascend chips.