In practice, GPT 5.3 Codex’s usable context is determined less by a single published “max tokens” number and more by the product surface (IDE, app, CLI, API), the tooling around it, and how much of that context is actually relevant. Long context is expensive: the more tokens you send, the more compute and latency you pay, and the harder it becomes for any model to keep attention on the right details. That’s why production-grade coding workflows almost always rely on selective context (the few files and docs that matter) plus a validation loop (tests/build). In other words, even if the model can technically ingest a lot, you’ll get better outcomes by sending less—but better—context.
OpenAI’s own materials emphasize long-running tasks and “compaction” as a way to sustain progress across long horizons. In the GPT-5.3-Codex system documentation, compaction is described as being used to prevent the context window from growing too large in long agentic evaluations (compaction triggered every 100K tokens in a particular eval harness) and as enabling sustained coherent progress across long horizons [GPT-5.3-Codex System Card] (https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf?utm_source=chatgpt.com). The implication for developers is important: long tasks should be structured as a sequence of steps where the system periodically summarizes and preserves only the essential state. You shouldn’t assume “just keep dumping more chat history” will work indefinitely; design your workflow so the agent carries forward a compact project state (plan, decisions, open TODOs, current diffs) rather than raw transcripts.
The practical solution is to treat “context” as a data pipeline. For code tasks, build a context pack: relevant file excerpts, symbol definitions, build config, and failing logs—kept short and structured. For knowledge tasks, use retrieval rather than giant prompts. Store docs, runbooks, and code patterns in Milvus or managed Zilliz Cloud and retrieve only the top-k most relevant chunks per request, optionally filtered by repo, module, language, and version. This gives you predictable token budgets, keeps latency reasonable, and improves accuracy because the model is reading the right pages instead of scanning everything. If you need to support truly massive repos, combine retrieval with hierarchical summarization: summarize modules into “capsules” stored in the vector DB, retrieve capsules first, then retrieve the deeper file snippets only when needed.
