Openclaw Data Privacy: GDPR and SOC 2 Compliance Considerations

Self-hosting Openclaw gives you something rare in the AI agent space: you control the infrastructure, the data, and the perimeter. But control is not the same as compliance. The Dutch Data Protection Authority issued a formal warning in early 2026 against running AI agents like Openclaw on systems with sensitive data, and SecurityScorecard found over 40,000 Openclaw instances exposed to the internet with no access controls. The gap between “self-hosted” and “compliant” is where most teams get stuck.

This guide maps Openclaw’s architecture to the specific regulatory controls that CTOs and compliance officers need to evaluate. By the end, you will know exactly what data Openclaw processes, where it goes, which GDPR articles apply, how SOC 2 Trust Service Criteria map to a self-hosted deployment, and what configuration changes close the compliance gaps.

What Data Openclaw Processes

Before you can assess compliance, you need a clear inventory of the data Openclaw touches. There are four categories.

Session data includes every prompt you send and every response the agent returns. By default, Openclaw stores sessions as plaintext files in ~/.openclaw/sessions/ with no expiration. A single month of active use generates thousands of session files containing the full text of every conversation.

Memory data is curated context the agent retains between sessions. The MEMORY.md file and SQLite index accumulate facts, preferences, and working context. If your team uses Openclaw for client work, memory files will contain client names, project details, and potentially sensitive business information.

Credential and configuration data lives in .env files and config.yaml. This includes API keys for LLM providers, OAuth tokens for connected services (Slack, email, CRM), and integration credentials. These are stored in plaintext by default.

Tool execution logs record every action Openclaw performs: file reads, API calls, shell commands, and their outputs. Logs land in /tmp/openclaw/ and can contain any data the agent accessed during execution.

For GDPR purposes, if any of these categories contain personal data (and they almost certainly will if Openclaw interacts with customers, employees, or client information), you are processing personal data under the regulation.

Data Flow Analysis: What Leaves Your Perimeter

The compliance-critical question is not what Openclaw stores locally but what it sends externally. Here is the breakdown.

Cloud LLM Routing (Default Configuration)

When Openclaw sends a prompt to Anthropic, OpenAI, or Google, the full conversation context travels to that provider’s infrastructure. This includes the user prompt, relevant memory context, tool results from the current session, and system instructions. The response travels back and is stored locally.

This means your EU-hosted Openclaw instance on a Hetzner server in Falkenstein is still transmitting data to US-based infrastructure on every API call, unless you configure a local model.

Local LLM Routing (Zero External Data)

Running Openclaw with Ollama and a local model like Llama 4 Scout or Qwen 3 eliminates all external data transmission. Every prompt and response stays within your network boundary. For organizations handling regulated data, this is the only configuration that achieves true data sovereignty.

You pay for that privacy with model capability. Local models lag behind Claude Opus 4.6 and GPT-5.4 on complex reasoning tasks. For regulated environments, a hybrid approach works well: local models for tasks involving sensitive data, cloud models for general research and content work where no personal data enters the prompt.

Connected Service Data

When Openclaw connects to Slack, email, CRM, or file storage through skills, it pulls data from those services into its local context. An email integration skill reading your inbox means email content flows into Openclaw’s session logs and potentially into LLM prompts. Map each connected service as a data source in your processing register.

If you operate in the EU or process data of EU residents, GDPR applies regardless of where your Openclaw instance runs. Here are the specific obligations.

You Are the Data Controller

Openclaw is a tool. You are the data controller under GDPR Article 4(7) because you determine the purposes and means of processing. This is true whether you self-host or use a managed platform. The distinction matters because controllers bear the heaviest compliance obligations: lawful basis documentation, data subject rights fulfillment, breach notification, and Data Protection Impact Assessments.

Data Processor Relationships

Every cloud LLM provider is a data processor under GDPR. You need a Data Processing Agreement (DPA) with each one.

Provider	DPA Available	API Data Retention	Training on API Data	EU Endpoint
Anthropic	Yes	Zero (with ZDR agreement)	No (API tier)	No dedicated EU endpoint
OpenAI	Yes	30 days (standard)	No (API tier)	Available via Azure
Google (Gemini)	Yes	Varies by tier	No (paid API)	EU endpoints available
Local (Ollama)	N/A	Zero (your hardware)	N/A	N/A

Anthropic’s Zero Data Retention (ZDR) agreement means prompts are processed for real-time abuse detection and then immediately discarded. This is the strongest privacy position among cloud providers. OpenAI’s standard 30-day retention period creates a wider compliance surface.

Right to Erasure (Article 17)

Openclaw has no built-in right-to-erasure workflow. When a data subject requests deletion under GDPR, you must manually locate and delete their data across four locations: session files, memory entries, tool execution logs, and any LLM provider that retained the data.

For session files, grep the sessions directory for the subject’s name or identifiers. For memory, audit MEMORY.md and the SQLite index. For logs, check /tmp/openclaw/ and any log aggregation you have configured. For LLM providers, confirm their retention period has expired or submit a deletion request through their API.

This is operationally painful. If you expect regular erasure requests, build automation around it now rather than handling each one manually. Our data retention policies guide covers the scripting approach.

Data Minimization (Article 5(1)(c))

Openclaw’s defaults violate data minimization. Sessions accumulate indefinitely, memory grows without pruning, and there is no time-to-live on any stored data. To comply, configure retention policies:

Set a session retention window (30 days is a reasonable starting point)
Enable memory compaction to prune stale entries
Configure log rotation with defined retention periods
Document your retention rationale in your processing register

SOC 2 Controls for Self-Hosted Openclaw

SOC 2 compliance is not a certification you obtain for software. It is an audit of your organization’s controls against five Trust Service Criteria. Here is how each maps to a self-hosted Openclaw deployment.

Security (CC6)

This is the broadest category. For Openclaw, it covers:

Access controls: Restrict who can access the Openclaw instance. Use SSH key authentication, VPN-only access, or reverse proxy with authentication. CVE-2026-25253 demonstrated that exposed WebSocket endpoints allow remote code execution. Never expose Openclaw’s ports to the public internet.
Network isolation: Run Openclaw in a private subnet. Use Docker networking or firewall rules to restrict outbound traffic to only approved LLM endpoints.
Encryption at rest: Openclaw stores data in plaintext by default. Implement filesystem-level encryption (LUKS on Linux, FileVault on macOS) or use encrypted Docker volumes.
Encryption in transit: LLM API calls use TLS 1.3 by default. Ensure internal connections (reverse proxy to Openclaw) also use TLS.

Availability (A1)

If your team relies on Openclaw for business operations, document your availability targets and implement monitoring. This includes health checks, automatic restart (systemd or Docker restart policies), and backup procedures. Our backup and restore guide covers the specifics.

Processing Integrity (PI1)

Ensure Openclaw produces reliable outputs. This means version-pinning your Openclaw installation (do not auto-update in production), validating skill outputs before acting on them, and maintaining audit trails of agent actions. The audit logging guide covers how to configure immutable action logs.

Confidentiality (C1)

Classify the data Openclaw processes. If it handles confidential client data, enforce access restrictions, use local models for sensitive tasks, and ensure credentials are stored in encrypted secret managers rather than plaintext .env files.

Privacy (P1)

This overlaps significantly with GDPR. Document what personal data Openclaw processes, establish retention policies, implement deletion procedures, and maintain a processing register. The GDPR controls above satisfy most SOC 2 privacy criteria.

Model Provider Data Policies

Choosing your LLM provider is a compliance decision. Here is what each provider does with your API data.

Anthropic offers the strongest privacy position for API customers. With Zero Data Retention enabled, prompts are not stored after processing. Anthropic does not train on API data. The limitation: there is no dedicated EU data center, so data transits through US infrastructure. For GDPR purposes, you need the Anthropic DPA and must rely on Standard Contractual Clauses for the EU-US transfer.

OpenAI retains API data for 30 days by default for abuse monitoring. They do not train on API-tier data. Enterprise customers can negotiate shorter retention. OpenAI offers Azure-hosted endpoints in EU regions, which simplifies GDPR data residency requirements.

Google Gemini paid API tiers do not use data for training. Retention policies vary by service tier. Google offers EU-based endpoints through Google Cloud regions.

Local models via Ollama eliminate all provider concerns. Running Llama 4 Scout, Mistral, or Qwen locally means zero data leaves your infrastructure. The compliance surface collapses to your own operational controls.

For regulated deployments, Anthropic with ZDR is the strongest cloud option and Ollama is the strongest local fallback. This gives you the strongest cloud privacy position combined with a zero-transmission option for sensitive tasks.

Compliance Implementation Checklist

Use this checklist as a starting point for your compliance assessment. Map each item to your organization’s specific regulatory requirements.

GDPR Essentials:

Document your lawful basis for processing personal data through Openclaw (Article 6)
Complete a Data Protection Impact Assessment if processing at scale (Article 35)
Execute DPAs with every cloud LLM provider you use (Article 28)
Implement session retention policies with defined time-to-live
Build a right-to-erasure workflow covering all four data locations
Configure memory compaction to enforce data minimization
Document data flows in your processing register (Article 30)

SOC 2 Essentials:

Restrict Openclaw access to authorized users only (never expose publicly)
Encrypt data at rest (filesystem or volume encryption)
Encrypt data in transit (TLS for all connections)
Implement audit logging for all agent actions
Set up monitoring and alerting for the Openclaw process
Version-pin Openclaw and review updates before deploying
Classify data processed by Openclaw and apply appropriate controls

Architecture Decisions:

Decide cloud vs local models based on data sensitivity classification
Configure hybrid routing if using both (sensitive tasks to local, general tasks to cloud)
Isolate Openclaw in its own network segment or container
Store credentials in a secret manager, not plaintext .env

Frequently Asked Questions

No. Self-hosting gives you control over data storage and processing location, but GDPR compliance requires documented lawful basis, data subject rights workflows, retention policies, DPAs with processors, and breach notification procedures. The infrastructure is one piece of a larger compliance framework.

What data does Openclaw send to cloud LLM providers?

The full conversation context: your prompt, relevant memory entries, tool execution results from the current session, and system instructions. Attached files or referenced documents are not sent unless the agent reads their contents into the conversation. The response from the provider is transmitted back over TLS 1.3.

Can you run Openclaw with zero external data transmission?

Yes. Configure Openclaw to use Ollama with a locally-hosted model. All processing stays on your hardware. You lose some model capability compared to Claude Opus 4.6 or GPT-5.4. A hybrid approach, using local models for sensitive data and cloud models for general tasks, balances privacy with capability.

How do you handle right-to-erasure requests?

Manually, unless you build automation. Search session files for the data subject’s identifiers, audit memory entries, check tool execution logs, and confirm LLM provider retention has expired. There is no built-in Openclaw feature for this. Automate with scripts that grep sessions and memory for known identifiers, then delete matching files.

Is Openclaw SOC 2 compliant?

SOC 2 is not a software certification. It is an audit of your organization’s controls. A self-hosted Openclaw deployment can be part of a SOC 2-compliant environment if you implement the appropriate controls: access management, encryption, audit logging, monitoring, and documented procedures. The software itself is neither compliant nor non-compliant.

Anthropic offers Zero Data Retention for API customers (no data stored after processing). OpenAI offers Azure-hosted EU endpoints with enterprise retention controls. Google Gemini provides EU-based Cloud endpoints. For maximum GDPR alignment, use a locally hosted model or an EU-based provider like Mistral AI.

How long do Anthropic and OpenAI retain API data?

Anthropic with ZDR: zero retention, data processed and discarded immediately. Anthropic without ZDR: up to 30 days. OpenAI standard API: 30 days for abuse monitoring. OpenAI Enterprise: negotiable retention windows. These retention periods apply to the prompts and responses, not to model training (neither provider trains on API data).

What should a DPIA cover for Openclaw?

Your Data Protection Impact Assessment should document: the categories of personal data processed, the purpose and lawful basis, data flows including LLM provider transfers, retention periods for each data category, technical and organizational security measures, the risk to data subjects, and measures to mitigate those risks. The EU-US data transfer to cloud LLM providers is a high-risk element that requires specific justification.

Key Takeaways

Self-hosting Openclaw gives you data sovereignty but does not give you compliance. You must build the regulatory framework around it.
Every cloud LLM API call sends conversation context outside your perimeter. Map these data flows and execute DPAs with each provider.
Anthropic’s Zero Data Retention agreement offers the strongest cloud privacy position. Local models via Ollama offer the only zero-transmission option.
SOC 2 is about your controls, not the software. Implement access management, encryption, audit logging, and monitoring around your Openclaw deployment.
Openclaw’s defaults (indefinite session storage, plaintext credentials, no retention policies) fail both GDPR data minimization and SOC 2 confidentiality requirements. Configuration changes are mandatory.

If your team needs help navigating Openclaw compliance for a regulated deployment, SFAI Labs works with enterprise teams on self-hosted AI agent architecture, data flow mapping, and compliance implementation.

Openclaw Data Privacy: GDPR and SOC 2 Compliance Considerations

What Data Openclaw Processes

Data Flow Analysis: What Leaves Your Perimeter