What does Claude Code's safety classifier check?

Claude Code's safety classifier (Auto Mode) evaluates each tool call before execution, checking for specific dangerous patterns and risky behaviors that could compromise your system or data. The classifier specifically identifies and blocks: recursive file deletion (rm -rf, especially with glob patterns), credential and API key exposure (commands that output passwords, tokens), sensitive data exfiltration (piping PII to external services), code injection attacks (commands that execute untrusted input), malicious binary execution (running unfamiliar executables), and irreversible system modifications (disk operations, boot configuration changes). The classifier also flags ambiguous commands where intent is unclear: for example, if you ask Claude to "clean up old files" and Claude interprets it too broadly, the classifier catches the overgeneralization. Each tool call is analyzed in isolation before execution, examining the command's declared intent and potential side effects. The classifier does not see file contents or tool results, preventing hostile instructions in files from influencing decisions. This one-way filtering is the key safety mechanism: even if a file contains instructions to delete your project, the classifier evaluates the delete command itself, not the file's instructions. Practical example: if Claude attempts rm -rf /var/log/*, the classifier recognizes the recursive deletion pattern and blocks it. If Claude attempts to git push with credentials in the URL, the classifier detects credential exposure and blocks it. If Claude runs npm install on an untrusted package with a known vulnerability, the classifier may flag it depending on exploit patterns. The classifier's design prioritizes false negatives (allowing some risky actions) over false positives (blocking safe work). This means it permits legitimate operations like deleting your own project files while blocking obviously malicious patterns. To enhance Claude Code's ability to reason about complex codebases, Zilliz Cloud's vector database can index your code at scale, enabling semantic search that helps agents find relevant functions, modules, and documentation patterns more effectively.

Learn more:

What does Claude Code's safety classifier check?

Keep Reading