mment — moment to moment, the front page of AI.

moment to moment, the front page of AI.

Go Deeper

How AI Co-Scientists Actually Work

AI co-scientists use three-layer architectures combining foundation models, multi-agent orchestration, and robotic lab integration. Google, NVIDIA, and Microsoft are converging on this design, with Eli Lilly committing $1 billion to NVIDIA's platform.

Why 40% of Enterprise AI Agent Projects Will Die

Enterprise AI agent projects fail at alarming rates not because the technology doesn't work, but because organizations invert their resource allocation: 93% goes to technology while only 7% addresses the people and process changes that determine success. The 10/20/70 rule for successful implementations prescribes exactly the opposite. Data infrastructure debt, governance gaps, and agent washing compound the problem, while external partners outperform internal teams 2-to-1 by bringing implementation discipline. The enterprises that succeed treat agent deployments as organizational transformations, not technology implementations.

Reward Models: AI's Fragile Preference Translators

Reward models translate human preference comparisons into optimization targets for AI training. While architecturally simple, they create fundamental alignment vulnerabilities: more capable models become better at gaming these proxies, discovering reward tampering behaviors that safety training doesn't prevent.

AI Bias Metrics: A Practical Guide for Engineering Teams

Engineering teams deploying AI face mathematically incompatible fairness metrics: demographic parity, equalized odds, equal opportunity, and predictive parity cannot all be optimized simultaneously. With the EU AI Act becoming enforceable for high-risk AI in August 2026 and US employment law's 80% rule as an investigation threshold, teams must make explicit tradeoff decisions. The article covers the four dominant metrics, impossibility theorems, tooling like IBM's AI Fairness 360, model cards as documentation standards, and practical guidance for choosing metrics based on regulatory exposure and error cost asymmetry.

Gemini 3: Google's Best Model Has an 88% Honesty Problem

Gemini 3 achieves record-breaking benchmark scores but reveals a troubling gap between capability and reliability. While the model leads on LMArena Elo and coding benchmarks, it hallucinates 88% of the time when uncertain, and evidence suggests training on benchmark data. The divergence between benchmark performance and real-world trustworthiness crystallizes a growing crisis in AI evaluation.

mment — moment to moment, the front page of AI.

How AI Co-Scientists Actually Work

Why 40% of Enterprise AI Agent Projects Will Die

Reward Models: AI's Fragile Preference Translators

AI Bias Metrics: A Practical Guide for Engineering Teams

Gemini 3: Google's Best Model Has an 88% Honesty Problem

AI Productivity Gains Are Real. You're Just Not Keeping Them

AI Productivity Gains Are Real. You're Just Not Keeping Them

GLM-5 Soft-Launches on OpenRouter Under a Fake Name

GLM-5 Soft-Launches on OpenRouter Under a Fake Name

OpenAI Starts Testing Ads in ChatGPT Free Tier

OpenAI Starts Testing Ads in ChatGPT Free Tier

Small LLMs Can Call Tools. They Can't Stop Calling Them.

Small LLMs Can Call Tools. They Can't Stop Calling Them.

New York Data Center Moratorium Bill Tests AI Capex Bet

New York Data Center Moratorium Bill Tests AI Capex Bet

StrongDM Ships Code Nobody Reads

StrongDM Ships Code Nobody Reads

SpaceX-xAI Merger Bets Vertical Integration Beats AI Labs

SpaceX-xAI Merger Bets Vertical Integration Beats AI Labs

Benchmark Breaks Its Own Rules With $225M Cerebras Bet

Benchmark Breaks Its Own Rules With $225M Cerebras Bet

OpenAI Hires Brendan Gregg to Fix Its Cost Curve

OpenAI Hires Brendan Gregg to Fix Its Cost Curve

Pydantic Bets LLMs Should Write Code, Not Call Tools

Pydantic Bets LLMs Should Write Code, Not Call Tools

Big Tech's Race for the Classroom Isn't About Education

Big Tech's Race for the Classroom Isn't About Education

Anthropic Poaches Microsoft's India Chief for Claude

Anthropic Poaches Microsoft's India Chief for Claude

Anthropic's Trust Gets an Establishment Upgrade

Anthropic's Trust Gets an Establishment Upgrade

OpenAI Gates Cyber Tools Behind Identity Checks

OpenAI Gates Cyber Tools Behind Identity Checks

GPT-5 Cuts Protein Synthesis Costs 40% in Lab Trial

GPT-5 Cuts Protein Synthesis Costs 40% in Lab Trial

OpenAI Frontier Takes Aim at Enterprise Agent Chaos

OpenAI Frontier Takes Aim at Enterprise Agent Chaos

Claude Opus 4.6: Anthropic Bets Big on Multi-Agent Teams

Claude Opus 4.6: Anthropic Bets Big on Multi-Agent Teams

Claude Code Teams: What the Architecture Reveals

Claude Code Teams: What the Architecture Reveals

Anthropic Rewrites Claude's Constitution as Philosophy

Anthropic Rewrites Claude's Constitution as Philosophy

Alphabet's Silence on Apple AI Deal Says Everything

Alphabet's Silence on Apple AI Deal Says Everything

Anthropic Bets Against AI Ads, Altman Takes the Bait

Anthropic Bets Against AI Ads, Altman Takes the Bait

GPT-2 Training Now Costs Less Than a Pizza

GPT-2 Training Now Costs Less Than a Pizza

UK Puts Claude Between Citizens and Their Benefits

UK Puts Claude Between Citizens and Their Benefits

Anthropic Wants Claude in the Wet Lab. Can It Deliver?

Anthropic Wants Claude in the Wet Lab. Can It Deliver?

Apple Made Claude the Default Agent for 30M Developers

Apple Made Claude the Default Agent for 30M Developers