From language models as function spaces to autonomous agents as dynamical systems — rigorously grounded, practically oriented.
For a mathematician, the cleanest framing: a large language model is a learned conditional probability distribution over token sequences. Given a vocabulary 𝒱 and a context of length n, the model parameterises:
where θ ∈ ℝd are billions of learned parameters (Claude Opus 4.6 ≈ several hundred billion), and the context window defines the support over which this conditional is computed — currently 1,000,000 tokens for Opus and Sonnet 4.6.
The architecture is a Transformer — a composition of self-attention layers implementing a form of differentiable database lookup, and feed-forward layers acting as associative memories. Training minimises cross-entropy loss (equivalently, maximises log-likelihood) over a massive corpus via stochastic gradient descent with adaptive optimisers.
Think of the context window as the working memory of the system — the domain over which the conditional distribution is defined. Claude's 1M token window corresponds to roughly 750,000 words, or an entire PhD thesis including all citations, held simultaneously in "attention." This is not merely quantitative — it qualitatively changes what classes of problems become tractable.
A 4K context window is like computing a matrix inverse using only a rolling window of 4 rows. A 1M window gives you the full matrix. Many tasks that were previously impossible (long document reasoning, full codebase analysis) become structurally different problems at 1M tokens.
Claude can be prompted into an extended thinking mode where it generates a private "scratchpad" of reasoning steps before producing a final answer. Mathematically, this is like adding explicit intermediate variables in a proof — reducing the depth of the final derivation by externalising intermediate state. Empirically, this dramatically improves performance on multi-step problems.
Like choosing between a theorem-prover, a calculator, and a lookup table — each optimal at a different point on the capability-latency-cost Pareto frontier.
| Model | Analogy | Context | Best For | Speed |
|---|---|---|---|---|
| Opus 4.6 | Full theorem prover — traces all branches | 1M tokens | Complex reasoning, legal/financial analysis, long-horizon tasks (14.5 hr) | Slower |
| Sonnet 4.6 | Symbolic CAS — fast, highly capable | 1M tokens | Everyday work, coding, agent workflows. Beats Opus 4.5 in 59% of tests | Balanced |
| Haiku 4.5 | Lookup table — O(1) recall, low cost | 200K tokens | High-volume API pipelines, sub-agent tasks, real-time responses | Fastest |
Haiku 4.5 has zero prompt injection protection. In agentic settings where the model processes untrusted external input (e.g., web pages, user-submitted text), use Sonnet or Opus instead — analogous to running an unverified proof through a formal checker rather than trusting it by default.
Claude is not a single product — it is a family of interfaces exposing the same underlying model to different interaction paradigms.
Web, mobile, desktop. Projects, memory, web search, Artifacts. Your primary research interface.
Terminal-native agentic coder. Reads codebases, edits files, runs commands, manages git — all from natural language.
Visual diffs, parallel sessions, scheduled tasks, computer use. The orchestration layer for complex workflows.
Browser-based Claude Code. No local setup. Long-running tasks you can check asynchronously — like submitting a batch job.
Desktop agent for knowledge workers. Reads local files, executes multi-step workflows, produces deliverables. No coding needed.
Programmatic access. Build your own agents, pipelines, and research tools on top of Claude's reasoning capabilities.
The term agentic AI has a precise meaning: a system that operates in a perceive → plan → act → observe loop, autonomously pursuing goals over extended time horizons. Formally, this is a Markov Decision Process (MDP) where Claude serves as the policy π:
State st is the full context (conversation history + tool results + memory). Action at is either a response to the user or a tool call (file edit, bash command, API call, web search). The agent iterates until a terminal condition — task completion, error, or human interrupt.
What changed in 2025–2026 is the depth of this loop. Early chatbots had depth 1 (one turn). Claude Code's agent loop can run for 14.5 hours autonomously, executing thousands of tool calls, with the context window maintaining coherent state throughout.
Claude Code now supports parallel sub-agents: a coordinating agent decomposes a task into independent sub-problems, spawns multiple Claude instances, and aggregates results. This is formally analogous to parallel proof search — exploring multiple branches of a proof tree simultaneously, then selecting the successful branch.
If task T decomposes as T = T₁ ∪ T₂ ∪ … ∪ Tₖ with Ti ⊥ Tj (independent subtasks), spawn k agents in parallel. Wall-clock time reduces from O(k·t) to O(t) — the same asymptotic argument as parallel algorithms in complexity theory. In practice: build frontend and backend simultaneously; run security scan while writing docs.
Claude Code's "auto mode" uses two classifiers: an input-layer prompt-injection probe, and an output-layer transcript classifier. Safe actions proceed automatically; risky ones are blocked. Anthropic reports users approve 93% of prompts manually — auto mode eliminates that overhead while preserving safety boundaries. Think of it as a prior over action safety that is learned from human approval patterns.
Claude Code can run recurring jobs on Anthropic-managed cloud infrastructure even while your laptop is off — like submitting a cron job to a cluster. Each run starts from a fresh clone, executes its agent loop, and produces a pull request for your review. Use cases: weekly security audits, nightly dependency checks, post-merge documentation sync.
The Model Context Protocol (MCP) is Anthropic's open standard for connecting Claude to external tools, data sources, and services. Mathematically, it is a typed interface specification: each tool is declared with its signature (name, inputs, outputs, side-effects), and Claude's planner reasons about which tools to compose to achieve a goal.
MCP standardises how these functions are declared and invoked. Claude's planner builds a composition graph — selecting and chaining tools in sequence or parallel to satisfy the user's goal. The 75+ available connectors include Google Drive, Gmail, Slack, GitHub, Notion, databases, and domain APIs.
Legal reasoning: contract review, regulatory compliance. 90.2% on BigLaw Bench — highest of any model.
Financial analysis: earnings transcripts, market research, due-diligence synthesis.
Code review automation, CI/CD pipeline integration, PR analysis — fully agentic.
No-code app generation and cloud dev environments — from spec to deployed app.
Click any node to explore how Claude's components interact in a research workflow.
Click the question to follow the decision path.
Click each step to see what happens at each stage.
| Task | Claude.ai | Claude Code | Cowork | API |
|---|---|---|---|---|
| Proof verification | ★★★ | ★★ | — | ★★ |
| Literature review | ★★★ | — | ★★★ | ★★ |
| Code implementation | ★★ | ★★★ | — | ★★★ |
| Data analysis | — | ★★★ | — | ★★★ |
| Grant writing | ★★★ | — | ★★★ | — |
| File management | — | ★★★ | ★★★ | — |
| Teaching/tutoring | ★★★ | — | — | ★★ |
With 1M context, load an entire research project — all papers, code, and notes — into a single conversation.
A structured ramp-up path from "proficient user" to "agentic power user" — tailored for someone with mathematical depth and a research context.
Create a Project for your research area. Upload your papers, notes, and reference materials. Use Opus 4.6 with extended thinking for deep literature review and proof-checking. Practice structured prompting: give context, specify format, request step-by-step reasoning.
Connect Google Drive and Gmail via MCP connectors. Ask Claude to draft grant summaries, format citations, extract theorems from PDFs. Build one small Python script with Claude Code — let it write, test, and debug autonomously.
Install Claude Code CLI. Run a real agentic task: have Claude refactor a simulation script, run tests, fix failures, and commit. Explore /schedule for recurring literature monitoring. Learn CLAUDE.md to give persistent context about your codebase and preferences.
Use the Anthropic API + Agent SDK to build custom research tools: automated paper clustering, theorem-extraction pipelines, dataset documentation agents. Combine Haiku (fast, high-volume) for subagent tasks with Opus (deep reasoning) for synthesis.
The most powerful use of agentic AI is not "asking better questions" — it is closing feedback loops. Write code → run it → observe failure → rewrite → retest. A human doing this manually takes hours. An agent does it in minutes, autonomously. Your comparative advantage shifts from execution to problem formulation and verification — precisely the skills your PhD has honed.
In the agentic period, your role resembles that of a proof architect rather than a proof writer. You define the theorem (goal), specify the axioms (constraints and tools), and verify the proof structure (agent output). The agent handles the lemma-by-lemma execution. The mathematician who learns to operate at this level of abstraction gains a compounding advantage — each well-specified task trains your intuition for what Claude can and cannot verify autonomously.