Claude Ecosystem — A Mathematician's Field Guide

§1 — Foundations

What is a Language Model, Precisely?

For a mathematician, the cleanest framing: a large language model is a learned conditional probability distribution over token sequences. Given a vocabulary 𝒱 and a context of length n, the model parameterises:

Formal Definition

P(x_n+1 | x₁, x₂, …, x_n ; θ)

where θ ∈ ℝ^d are billions of learned parameters (Claude Opus 4.6 ≈ several hundred billion), and the context window defines the support over which this conditional is computed — currently 1,000,000 tokens for Opus and Sonnet 4.6.

The architecture is a Transformer — a composition of self-attention layers implementing a form of differentiable database lookup, and feed-forward layers acting as associative memories. Training minimises cross-entropy loss (equivalently, maximises log-likelihood) over a massive corpus via stochastic gradient descent with adaptive optimisers.

Why Context Window Matters

Think of the context window as the working memory of the system — the domain over which the conditional distribution is defined. Claude's 1M token window corresponds to roughly 750,000 words, or an entire PhD thesis including all citations, held simultaneously in "attention." This is not merely quantitative — it qualitatively changes what classes of problems become tractable.

Analogy

A 4K context window is like computing a matrix inverse using only a rolling window of 4 rows. A 1M window gives you the full matrix. Many tasks that were previously impossible (long document reasoning, full codebase analysis) become structurally different problems at 1M tokens.

Extended Thinking — Scratchpad Reasoning

Claude can be prompted into an extended thinking mode where it generates a private "scratchpad" of reasoning steps before producing a final answer. Mathematically, this is like adding explicit intermediate variables in a proof — reducing the depth of the final derivation by externalising intermediate state. Empirically, this dramatically improves performance on multi-step problems.

§2 — The Model Family

Choosing Your Operator: Opus, Sonnet, Haiku

Like choosing between a theorem-prover, a calculator, and a lookup table — each optimal at a different point on the capability-latency-cost Pareto frontier.

Model	Analogy	Context	Best For	Speed
Opus 4.6	Full theorem prover — traces all branches	1M tokens	Complex reasoning, legal/financial analysis, long-horizon tasks (14.5 hr)	Slower
Sonnet 4.6	Symbolic CAS — fast, highly capable	1M tokens	Everyday work, coding, agent workflows. Beats Opus 4.5 in 59% of tests	Balanced
Haiku 4.5	Lookup table — O(1) recall, low cost	200K tokens	High-volume API pipelines, sub-agent tasks, real-time responses	Fastest

⚠ Note on Haiku

Haiku 4.5 has zero prompt injection protection. In agentic settings where the model processes untrusted external input (e.g., web pages, user-submitted text), use Sonnet or Opus instead — analogous to running an unverified proof through a formal checker rather than trusting it by default.

§3 — Deployment Surfaces

Six Environments, One Ecosystem

Claude is not a single product — it is a family of interfaces exposing the same underlying model to different interaction paradigms.

💬

Claude.ai Chat

Web, mobile, desktop. Projects, memory, web search, Artifacts. Your primary research interface.

⌨️

Claude Code (CLI)

Terminal-native agentic coder. Reads codebases, edits files, runs commands, manages git — all from natural language.

🖥️

Desktop App

Visual diffs, parallel sessions, scheduled tasks, computer use. The orchestration layer for complex workflows.

🌐

Claude Web (claude.ai/code)

Browser-based Claude Code. No local setup. Long-running tasks you can check asynchronously — like submitting a batch job.

📁

Cowork

Desktop agent for knowledge workers. Reads local files, executes multi-step workflows, produces deliverables. No coding needed.

🔌

API + SDK

Programmatic access. Build your own agents, pipelines, and research tools on top of Claude's reasoning capabilities.

§4 — Agentic Systems

Agents as Dynamical Systems

The term agentic AI has a precise meaning: a system that operates in a perceive → plan → act → observe loop, autonomously pursuing goals over extended time horizons. Formally, this is a Markov Decision Process (MDP) where Claude serves as the policy π:

Agent as Policy

a_t = π(s_t ; θ)

State s_t is the full context (conversation history + tool results + memory). Action a_t is either a response to the user or a tool call (file edit, bash command, API call, web search). The agent iterates until a terminal condition — task completion, error, or human interrupt.

What changed in 2025–2026 is the depth of this loop. Early chatbots had depth 1 (one turn). Claude Code's agent loop can run for 14.5 hours autonomously, executing thousands of tool calls, with the context window maintaining coherent state throughout.

The Agentic Loop — Hover Each Node

① PERCEIVE

Read current state: files, error messages, tool results, user instructions. This populates the context window s_t.

→

② REASON

Extended thinking: decompose goal into sub-tasks, estimate complexity, select strategy. Analogous to proof planning before writing.

→

③ ACT

Execute tool call: write code, run bash, call API, search web, edit file. Action is deterministic given the policy output.

→

④ OBSERVE

Tool result enters context. Claude checks: did the action succeed? Does the output match expectations? Adjust plan if needed.

→

⑤ VERIFY

Run tests, lint, check invariants. Claude can self-correct — rewriting until tests pass, like a proof assistant checking each step.

Parallel Agents — Concurrency as Decomposition

Claude Code now supports parallel sub-agents: a coordinating agent decomposes a task into independent sub-problems, spawns multiple Claude instances, and aggregates results. This is formally analogous to parallel proof search — exploring multiple branches of a proof tree simultaneously, then selecting the successful branch.

Parallel Decomposition

If task T decomposes as T = T₁ ∪ T₂ ∪ … ∪ Tₖ with T_i ⊥ T_j (independent subtasks), spawn k agents in parallel. Wall-clock time reduces from O(k·t) to O(t) — the same asymptotic argument as parallel algorithms in complexity theory. In practice: build frontend and backend simultaneously; run security scan while writing docs.

Auto Mode — Bayesian Permission Management

Claude Code's "auto mode" uses two classifiers: an input-layer prompt-injection probe, and an output-layer transcript classifier. Safe actions proceed automatically; risky ones are blocked. Anthropic reports users approve 93% of prompts manually — auto mode eliminates that overhead while preserving safety boundaries. Think of it as a prior over action safety that is learned from human approval patterns.

Scheduled Tasks — Asynchronous Computation

Claude Code can run recurring jobs on Anthropic-managed cloud infrastructure even while your laptop is off — like submitting a cron job to a cluster. Each run starts from a fresh clone, executes its agent loop, and produces a pull request for your review. Use cases: weekly security audits, nightly dependency checks, post-merge documentation sync.

§5 — Model Context Protocol

MCP: A Standard Interface for Tool Composition

The Model Context Protocol (MCP) is Anthropic's open standard for connecting Claude to external tools, data sources, and services. Mathematically, it is a typed interface specification: each tool is declared with its signature (name, inputs, outputs, side-effects), and Claude's planner reasons about which tools to compose to achieve a goal.

Tool as Typed Function

f : Input → Output ⊕ SideEffect

MCP standardises how these functions are declared and invoked. Claude's planner builds a composition graph — selecting and chaining tools in sequence or parallel to satisfy the user's goal. The 75+ available connectors include Google Drive, Gmail, Slack, GitHub, Notion, databases, and domain APIs.

Enterprise Plugin Marketplace (2026)

⚖️

Harvey

Legal reasoning: contract review, regulatory compliance. 90.2% on BigLaw Bench — highest of any model.

💰

Rogo

Financial analysis: earnings transcripts, market research, due-diligence synthesis.

🔁

GitLab

Code review automation, CI/CD pipeline integration, PR analysis — fully agentic.

🧩

Lovable / Replit

No-code app generation and cloud dev environments — from spec to deployed app.

§6 — Interactive Diagrams

How the Pieces Fit Together

Click any node to explore how Claude's components interact in a research workflow.

Model Selection Decision Tree

Click the question to follow the decision path.

        ▸ Does your task require deep reasoning, proof-checking, or creative synthesis?
      
          YES → Use Opus 4.6

          1M context · extended thinking · highest accuracy
Best for: research, complex math, long documents
        
NO → Is it high-volume or latency-sensitive?

            Sonnet 4.6 — best balance of speed + intelligence

            Code generation · data analysis · teaching · writing
          
            Haiku 4.5 — fastest, cheapest, subagent workhorse

            Classification · extraction · routing · high-volume pipelines
          
        Opus 4.6 — $15/1M input · $75/1M output

        Theorem proving · literature review · grant writing · code architecture · 14.5hr autonomous tasks

The Agentic Execution Loop

Click each step to see what happens at each stage.

①

PLAN

→

②

EXECUTE

→

③

OBSERVE

→

④

ITERATE

Click a step above to see details.

Which Surface for Which Task?

Task	Claude.ai	Claude Code	Cowork	API
Proof verification	★★★	★★	—	★★
Literature review	★★★	—	★★★	★★
Code implementation	★★	★★★	—	★★★
Data analysis	—	★★★	—	★★★
Grant writing	★★★	—	★★★	—
File management	—	★★★	★★★	—
Teaching/tutoring	★★★	—	—	★★

What Fits in 1M Tokens?

Chat
1 Paper
Codebase (50K LoC)
10+ Papers + Notes + Code

        0200K500K1,000,000 tokens
      
With 1M context, load an entire research project — all papers, code, and notes — into a single conversation.

§7 — Your Action Plan

Getting Ready for the Agentic Period

A structured ramp-up path from "proficient user" to "agentic power user" — tailored for someone with mathematical depth and a research context.

Week 1 — Foundation

Master Projects + Extended Thinking

Create a Project for your research area. Upload your papers, notes, and reference materials. Use Opus 4.6 with extended thinking for deep literature review and proof-checking. Practice structured prompting: give context, specify format, request step-by-step reasoning.

Week 2 — Tool Fluency

Automate Your Research Workflow

Connect Google Drive and Gmail via MCP connectors. Ask Claude to draft grant summaries, format citations, extract theorems from PDFs. Build one small Python script with Claude Code — let it write, test, and debug autonomously.

Week 3 — Agentic Depth

Multi-Step Autonomous Tasks

Install Claude Code CLI. Run a real agentic task: have Claude refactor a simulation script, run tests, fix failures, and commit. Explore /schedule for recurring literature monitoring. Learn CLAUDE.md to give persistent context about your codebase and preferences.

Month 2+ — Research Integration

Build Your Own Agent Pipeline

Use the Anthropic API + Agent SDK to build custom research tools: automated paper clustering, theorem-extraction pipelines, dataset documentation agents. Combine Haiku (fast, high-volume) for subagent tasks with Opus (deep reasoning) for synthesis.

✦ Key Insight for Mathematicians

The most powerful use of agentic AI is not "asking better questions" — it is closing feedback loops. Write code → run it → observe failure → rewrite → retest. A human doing this manually takes hours. An agent does it in minutes, autonomously. Your comparative advantage shifts from execution to problem formulation and verification — precisely the skills your PhD has honed.

Mental Model: You as Theorem Architect

In the agentic period, your role resembles that of a proof architect rather than a proof writer. You define the theorem (goal), specify the axioms (constraints and tools), and verify the proof structure (agent output). The agent handles the lemma-by-lemma execution. The mathematician who learns to operate at this level of abstraction gains a compounding advantage — each well-specified task trains your intuition for what Claude can and cannot verify autonomously.