To make Cowork reliable, structure prompts like a short engineering spec: goal, scope, constraints, outputs, validation, and escalation rules. Start with a single-sentence goal that is measurable (“Create out/report.md summarizing all docs and extract decisions into out/decisions.csv”). Then define scope precisely (“Only use files under /Shared/ProjectDocs; ignore tmp/ and node_modules/; process only .md, .txt, .pdf”). Next define constraints that prevent surprise behavior: “Do not delete anything,” “Do not overwrite originals,” “Write all new files to out/,” “If a file is unreadable, list it in out/errors.csv,” and “Do not use the internet unless I explicitly ask.” When people say Cowork “did the wrong thing,” it’s usually because the prompt omitted one of these: scope, constraints, or concrete deliverables.
For complex tasks, force checkpoints and make ambiguity handling explicit. A repeatable pattern is:
- Inventory: write
out/inventory.csvwith file paths, sizes, modified times, and detected type. - Plan: write
out/plan.mdlisting every operation (rename/move/edit) and why, then wait for confirmation. - Execute: apply changes, but log every action to
out/actions.log. - Validate: run sanity checks (counts, schema, missing fields) and write
out/validation.md. - Deliver: write final outputs plus a “what changed / what didn’t” summary.
Also specify escalation rules: “If you’re not confident about a classification, label it
unknownand add it toout/open_questions.md.” “If two sources conflict, prefer the newest timestamp and record both inout/conflicts.csv.” These rules keep Cowork from “filling in” uncertain details, which is a common failure mode in agent workflows.
If your task feeds a larger workflow, bake the downstream contract into the prompt so Cowork outputs are immediately ingestible. For example: “Chunk documents into 400–800 token sections,” “Assign stable IDs like DOC-###_S##,” “Emit out/metadata.jsonl with fields doc_id, chunk_id, title, source_path, updated_at, tags,” and “Write chunk files into out/chunks/.” That makes it trivial to validate and index into a vector database such as Milvus or Zilliz Cloud. The more your prompt looks like a schema + acceptance test, the more Cowork behaves like a reliable batch tool instead of a “helpful but improvisational” assistant.
