Claude Opus 4.6 is primarily described by Anthropic’s “what’s new” documentation in terms of long context, large outputs, extended thinking, and existing Claude API features. Whether you can pass images depends on the specific Claude API feature set and the exact model capabilities enabled in your account and integration flow. In practice, many teams treat Opus 4.6 as a text-first engine and rely on upstream components (OCR or vision models) when their inputs are screenshots or scanned documents.
A robust way to support image-heavy workflows is to convert images to structured text before calling Opus 4.6. For example, for screenshots: run OCR, extract UI text and error codes, and format them into a consistent schema (page title, visible strings, log snippets). For PDFs: parse text directly if it’s selectable; use OCR if it’s scanned. This makes behavior auditable because you can log exactly what text was provided.
If you’re building a knowledge assistant that references diagrams or images, store image descriptions alongside the document text in a vector database like Milvus or managed Zilliz Cloud. Retrieve both the relevant text chunks and the “diagram description” chunks for a query, then ask Opus 4.6 to answer using those retrieved passages. This approach keeps your system consistent even when the model input is text-only.
