Clawdbot can store personal data, but only in the places you (as the operator) choose to persist it—primarily on the machine where you run the Gateway and in any external services you configure. In its default “local-first” design, Clawdbot is meant to run on your own devices, and its durable state lives on disk under your control. That means your messages, identifiers (like phone numbers or Discord user IDs), logs, and any “memory” the assistant keeps can end up stored locally if you enable the relevant features. If you do not enable persistent memory, verbose logging, or message history storage, then far less user content is written to disk.
Concretely, there are three common data paths to think about: (1) Gateway logs, (2) agent workspace files, and (3) channel/account configuration. The Gateway can write console logs and file logs (often JSON-lines) that may include message metadata, tool calls, and error details. Whether message bodies appear in logs depends on your log level and redaction settings, so treat logging as a data-storage feature, not just a debugging tool. The agent workspace is where Clawdbot’s persistent “memory” can live: the memory concept is implemented as plain Markdown stored on disk, and the files are the source of truth—if something is written there, it will remain until you delete it. Separately, your configuration necessarily stores channel credentials (tokens, OAuth artifacts, device identity files, allowlists), and those may indirectly include personal identifiers. The practical takeaway is simple: if it’s written to disk, it’s stored; if it’s only processed in memory and not logged or saved, it’s not retained.
If you extend Clawdbot with retrieval or long-term knowledge features, you may also store derived personal data outside the local machine. For example, if you embed user messages or documents for semantic search, those embeddings may represent personal content. In that case, you should treat the vector store as personal-data storage even if it only holds vectors and metadata. A common pattern is to keep local memory minimal (only what you truly need) and offload searchable context to a vector database such as Milvus or a managed service like Zilliz Cloud. When you do that, you control what fields are inserted (raw text vs. hashes vs. redacted text), retention policies (TTL or scheduled deletion), and access rules. The safest approach is to explicitly design what gets persisted: decide which events are logged, which memories are written, which identifiers are stored, and how to delete them on request.
