Skip to main content

OpenClaw GKE Module — Configuration Guide

OpenClaw is a multi-tenant AI agent gateway that provides WebSocket-enabled conversational AI agents with persistent GCS-backed workspace storage. This module deploys OpenClaw on GKE Autopilot as a Kubernetes Deployment, backed by GCS Fuse CSI driver for durable agent workspace and Secret Manager for credential management.

OpenClaw GKE is a wrapper module built on top of App GKE. It delegates all GCP infrastructure provisioning to App GKE (GKE cluster, networking, GCS, Secret Manager, CI/CD) and adds OpenClaw-specific application configuration on top via the OpenClaw Common sub-module.

Note: Variables marked as platform-managed are set and maintained by the platform. You do not normally need to change them.


How This Guide Is Structured

This guide documents the variables that are unique to OpenClaw_GKE or that have OpenClaw-specific defaults that differ from the App_GKE base module. For all other variables — project identity, runtime scaling, backend configuration, CI/CD, networking, IAP, Cloud Armor, and VPC Service Controls — refer directly to the App_GKE Configuration Guide.

Configuration AreaOpenClaw-Specific Notes
Module Metadata (Group 0)Different module_description and module_documentation defaults.
Project & Identity (Group 1)Identical. deployment_region exposed as a fallback.
Application Identity (Group 2)application_name defaults to "openclaw".
Runtime & Scaling (Group 3)container_resources defaults to cpu_limit="2000m", memory_limit="2Gi". min_instance_count=1, max_instance_count=3.
Environment Variables (Group 4)Module-managed vars always injected by OpenClaw Common — see Platform-Managed Behaviours.
GKE Backend Config (Group 5)workload_type defaults to "Deployment". session_affinity defaults to "ClientIP". service_type defaults to "ClusterIP".
StatefulSet Config (Group 6)Available for sticky pod identity; not required for standard GCS-backed deployments.
Resource Quota (Group 7)Identical.
Reliability Policies (Group 8)enable_pod_disruption_budget=true, pdb_min_available="1" by default.
Observability (Group 9)Probes target /health. See Health Probes.
Workload Automation (Group 10)No default initialization jobs.
CI/CD (Group 11)Identical.
NFS (Group 12)enable_nfs=false by default. OpenClaw uses GCS Fuse for state.
Cloud Storage (Group 13)Module-managed workspace bucket at /data always provisioned.
OpenClaw Config (Group 14)Skills repo, AI credentials, and messaging platform integration — see OpenClaw Configuration.
Backup (Group 16)Present for interface compatibility. OpenClaw state is natively durable in GCS.
Custom Domain (Group 18)Standard. enable_custom_domain required for IAP.
IAP (Group 19)Requires iap_oauth_client_id, iap_oauth_client_secret, iap_support_email. See IAP.
Cloud Armor (Group 20)Standard.
VPC Service Controls (Group 21)Identical.

Platform-Managed Behaviours

The following behaviours are applied automatically by OpenClaw GKE (via the OpenClaw Common sub-module) regardless of variable values in your tfvars file.

BehaviourDetail
No databasedatabase_type = "NONE" and enable_redis = false are hard-coded in main.tf. Cloud SQL and Redis are never provisioned.
Custom image buildOpenClaw_Common sets image_source = "custom". A custom image is always built that layers entrypoint.sh onto ghcr.io/openclaw/openclaw:<application_version>. The BASE_IMAGE build arg is set at Cloud Build time.
GCS workspace at /dataOpenClaw Common always appends an openclaw-data GCS Fuse volume at /data with uid=1000,gid=1000 mount options. The <prefix>-storage bucket is always provisioned.
State dir on local diskOPENCLAW_STATE_DIR=/tmp/openclaw and XDG_CONFIG_HOME=/tmp/openclaw are always injected. This prevents npm staging failures caused by GCS Fuse's lack of hard-link support. Persistent agent workspace and agent state remain on /data.
Fixed environment variablesNODE_ENV=production, NODE_OPTIONS=--max-old-space-size=1536, NPM_CONFIG_CACHE=/tmp/.npm are always set.
Skills sync on startupIf SKILLS_REPO_URL is set, entrypoint.sh clones or updates the repository into /data/workspace/skill-library on every pod startup. Non-fatal — the gateway starts even if clone fails.
Config regenerated on startupentrypoint.sh always overwrites openclaw.json in $OPENCLAW_STATE_DIR, ensuring Terraform-managed env vars win over stale GCS-persisted values.
Session affinity defaults to ClientIPEnsures a given user's WebSocket sessions are consistently routed to the same pod when multiple replicas are deployed.
Anthropic secret always createdOpenClaw Common creates the <prefix>-anthropic-api-key Secret Manager secret unconditionally. The secret version is only written when anthropic_api_key is non-empty.

OpenClaw Application Identity

VariableDefaultDescription
application_name"openclaw"Internal identifier used as the base name for the Kubernetes service, GCS bucket, and related resources. Do not change after initial deployment.
application_display_name"OpenClaw Gateway"Human-readable name shown in the platform UI and monitoring dashboards. Can be updated freely.
description"OpenClaw AI Gateway - Multi-tenant AI agent gateway on GKE Autopilot"Brief description populated into Kubernetes annotations.
application_version"latest"OpenClaw image tag used as the BASE_IMAGE Docker build arg. Pin to a specific version (e.g. "1.2.3") for reproducible builds.

Runtime & Scaling

OpenClaw GKE uses a single structured container_resources object (as required by App GKE) rather than separate cpu_limit and memory_limit top-level variables.

VariableGroupDefaultDescription
container_resources3{ cpu_limit = "2000m", memory_limit = "2Gi" }CPU and memory limits. Optional cpu_request, mem_request, ephemeral_storage_limit, ephemeral_storage_request. Minimum 2 vCPU / 2 Gi recommended for agent workloads.
min_instance_count31Minimum pod replicas. 1 avoids cold starts for agent sessions.
max_instance_count33Maximum pod replicas. OpenClaw is stateful — per-tenant deployments typically use 1. Increase only with sticky session routing (session_affinity = "ClientIP").
container_port38080TCP port the OpenClaw gateway listens on. Must match the PORT env var.
timeout_seconds33600Request timeout in seconds. Agent sessions can be long-running.
enable_image_mirroring3trueMirror the built image to Artifact Registry.
enable_vertical_pod_autoscaling3falseEnable VPA. When enabled, HPA based on CPU/Memory is disabled to avoid conflicts.
container_protocol3"http1"Service protocol. Options: "http1", "h2c".

Key differences from App GKE defaults:

VariableApp GKE defaultOpenClaw GKE defaultReason
container_resources.cpu_limit"1000m""2000m"OpenClaw Node.js gateway benefits from at least 2 vCPU.
container_resources.memory_limit"512Mi""2Gi"Agent state and plugin staging require more memory.
min_instance_count11Keep warm to avoid cold-start latency for agent sessions.
session_affinity"None""ClientIP"WebSocket stickiness — routes a user's requests to the same pod.
service_type"ClusterIP""ClusterIP"Internal-only by default; external traffic flows through a router or enable_custom_domain.

GKE Backend Configuration

VariableGroupDefaultDescription
gke_cluster_name5""GKE Autopilot cluster name. Auto-discovers the Services GCP-managed cluster when empty.
namespace_name5""Kubernetes namespace. Auto-generated from resource prefix when empty.
workload_type5"Deployment""Deployment" for stateless replicas with GCS-backed state, or "StatefulSet" for sticky pod identity.
service_type5"ClusterIP""ClusterIP" for internal-only access; "LoadBalancer" for direct external access.
session_affinity5"ClientIP"ClientIP affinity ensures WebSocket stickiness. Set to "None" only for stateless replicas.
termination_grace_period_seconds560Allow sufficient time for active agent sessions to complete before pod termination. Valid range: 0–3600.
configure_service_mesh5falseEnable Istio sidecar injection for this namespace.
enable_network_segmentation5falseEnable Kubernetes NetworkPolicies restricting pod-to-pod traffic to within the same namespace.
deployment_timeout51800Timeout in seconds for waiting for the deployment to complete.
network_tags5[]Network tags applied to workload pods.

StatefulSet Configuration

When workload_type = "StatefulSet", these variables control the StatefulSet behavior. OpenClaw normally uses GCS Fuse for state; PVC is only needed when local disk performance is required.

VariableGroupDefaultDescription
stateful_pvc_enabled6falseEnable a PVC for StatefulSet. Use GCS Fuse instead unless local disk I/O is required.
stateful_pvc_size6"10Gi"PVC size.
stateful_pvc_mount_path6"/pvc-data"Container mount path for the PVC.
stateful_pvc_storage_class6"standard-rwo"Storage class for the PVC.
stateful_headless_service6trueCreate a headless service for stable network identities.
stateful_pod_management_policy6"OrderedReady"Pod management policy: "OrderedReady" or "Parallel".
stateful_update_strategy6"RollingUpdate"Update strategy: "RollingUpdate" or "OnDelete".

Reliability Policies

VariableGroupDefaultDescription
enable_pod_disruption_budget8trueEnable PodDisruptionBudget to ensure minimum availability during voluntary disruptions.
pdb_min_available8"1"Minimum pods that must remain available. "1" ensures at least one pod is always running.
enable_topology_spread8falseDistribute pods across zones.
topology_spread_strict8falseUse DoNotSchedule when topology spread cannot be satisfied.

Health Probes

OpenClaw exposes /health on port 8080. All probes target this path.

OpenClaw GKE exposes a dual probe system:

startup_probe / liveness_probe — passed to OpenClaw Common to configure the application-level probes.

startup_probe_config / health_check_config — passed directly to App GKE for Kubernetes probe configuration.

VariableGroupDefaultDescription
startup_probe9{ enabled=true, type="HTTP", path="/health", initial_delay_seconds=10, timeout_seconds=5, period_seconds=5, failure_threshold=36 }Passed to OpenClaw Common. 36 × 5s + 10s initial = ~190s, giving headroom for npm to stage 35+ bundled plugin packages before the gateway starts.
liveness_probe9{ enabled=true, type="HTTP", path="/health", initial_delay_seconds=30, timeout_seconds=5, period_seconds=30, failure_threshold=3 }Passed to OpenClaw Common.
startup_probe_config9{ enabled=true, path="/health", initial_delay_seconds=10, failure_threshold=36, period_seconds=5 }Kubernetes startup probe. 36-attempt threshold gives ~3 minutes for gateway startup.
health_check_config9{ enabled=true, path="/health", initial_delay_seconds=30, failure_threshold=3, period_seconds=30 }Kubernetes liveness probe.
uptime_check_config9{ enabled=false, path="/health" }Cloud Monitoring uptime check. Disabled by default for GKE (ClusterIP services are not externally reachable).
alert_policies9[]Cloud Monitoring metric alert policies.

Workload Automation

OpenClaw has no default initialization job — no database setup is required.

VariableGroupDefaultDescription
initialization_jobs10[]Kubernetes jobs executed once during deployment for custom workspace seeding. No default job.
cron_jobs10[]Recurring Kubernetes CronJobs.
additional_services10[]Additional Kubernetes sidecar or helper services deployed alongside the OpenClaw gateway. Useful for deploying an OpenClaw router as a companion service.

OpenClaw Configuration

Skills Repository

VariableGroupDefaultDescription
skills_repo_url14""GitHub URL of a shared OpenClaw skills repository. Cloned into /data/workspace/skill-library on every pod startup. Leave empty to skip skill syncing.
skills_repo_ref14"main"Git ref (branch, tag, or SHA) to check out.

AI Provider & Messaging Credentials

All credentials are stored in Secret Manager and injected at pod startup. Plaintext values are never written to Terraform state after the initial secret version is created.

VariableGroupDefaultDescription
anthropic_api_key14""Anthropic API key. Stored in Secret Manager; injected as ANTHROPIC_API_KEY. Required on initial deployment; omit on updates to retain stored value. Sensitive.
enable_telegram14falseProvision Telegram secrets. Requires both telegram_bot_token and telegram_webhook_secret.
telegram_bot_token14""Telegram bot token from @BotFather. Injected as TELEGRAM_BOT_TOKEN. Sensitive.
telegram_webhook_secret14""Webhook validation secret for the router (not the agent). Stored in Secret Manager; not injected into agent container. Generate with: openssl rand -hex 32. Sensitive.
enable_slack14falseProvision Slack secrets. Requires both slack_bot_token and slack_signing_secret.
slack_bot_token14""Slack bot token (xoxb-...). Injected as SLACK_BOT_TOKEN. Sensitive.
slack_signing_secret14""Slack signing secret for the router (not the agent). Stored in Secret Manager; not injected into agent container. Sensitive.

Validation: enable_slack = true requires both slack_bot_token and slack_signing_secret. enable_telegram = true requires both telegram_bot_token and telegram_webhook_secret. Violations are caught by validation.tf preconditions before apply.


Backup & Maintenance

OpenClaw state is natively durable in GCS. These variables are present for interface compatibility with App GKE.

VariableGroupDefaultDescription
backup_schedule16"0 2 * * *"Cron expression for automated workspace backup.
backup_retention_days167Days to retain backup files in GCS.
enable_backup_import16falseTriggers a one-time workspace import on apply.
backup_source16"gcs"Import source: "gcs" or "gdrive".
backup_uri16""GCS path or Google Drive file ID of the backup to import. Maps to backup_file in App GKE.
backup_format16"tar"Import format: tar, gz, tgz, tar.gz, zip.

Identity-Aware Proxy (GKE-specific)

OpenClaw GKE exposes three IAP variables required when enable_iap = true and enable_custom_domain = true.

VariableGroupDefaultDescription
iap_oauth_client_id19""OAuth client ID. Create in Google Cloud Console > APIs & Services > Credentials. Sensitive.
iap_oauth_client_secret19""OAuth client secret. Sensitive.
iap_support_email19""Support email shown on the OAuth consent screen. Must be a valid email address. Validated by regex.

Note: IAP on GKE requires enable_custom_domain = true. A custom domain with a reserved static IP is used by the Kubernetes Gateway API to provision the IAP-protected ingress.


Custom Domain (Group 18)

VariableGroupDefaultDescription
enable_custom_domain18falseEnable custom domain via Kubernetes Gateway API with SSL certificates. A static IP is automatically provisioned.
application_domains18[]Custom domains for the application. (e.g., ["agent.example.com"])
reserve_static_ip18trueReserve a static external IP for predictable endpoint configuration. Recommended for production.
static_ip_name18""Name for the reserved static IP. Auto-generated from resource prefix when empty.

Storage (Group 13)

VariableGroupDefaultDescription
create_cloud_storage13trueProvision additional GCS buckets defined in storage_buckets. The workspace bucket is always created.
storage_buckets13[]Additional GCS buckets beyond the auto-provisioned workspace bucket.
gcs_volumes13[]Additional GCS Fuse volumes via CSI driver. The openclaw-data workspace bucket at /data is always mounted.
manage_storage_kms_iam13falseCreates a CMEK KMS key for GCS encryption.
enable_artifact_registry_cmek13falseCMEK encryption for Artifact Registry.

NFS (Group 12)

OpenClaw uses GCS Fuse for state. NFS is disabled by default and not required.

VariableGroupDefaultDescription
enable_nfs12falseProvision a Cloud Filestore NFS instance. Not required for OpenClaw.
nfs_mount_path12"/mnt/nfs"NFS mount path. Only used when enable_nfs = true.
nfs_instance_name12""Existing NFS GCE VM name. Auto-discovered when empty.
nfs_instance_base_name12"app-nfs"Base name for inline NFS GCE VM.

Resource Quota (Group 7)

VariableGroupDefaultDescription
enable_resource_quota7falseEnable ResourceQuota for namespace resource limits.
quota_cpu_requests7""Total CPU requests allowed.
quota_cpu_limits7""Total CPU limits allowed.
quota_memory_requests7""Total memory requests allowed.
quota_memory_limits7""Total memory limits allowed.
quota_max_pods7""Maximum pods allowed.
quota_max_services7""Maximum services allowed.
quota_max_pvcs7""Maximum PVCs allowed.

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
anthropic_api_keyRequired on first deployCriticalMust be provided on initial deployment. Without a valid key, the agent starts but all AI requests fail with 401 errors. The key is stored in Secret Manager; subsequent updates can be made directly without redeploying the Kubernetes workload.
gateway_tokenAuto-generated 64-character hex tokenHighAn empty value generates a secure token automatically and stores it in Secret Manager. Supplying a weak or guessable value allows unauthorised access to the OpenClaw gateway API. Retrieve the auto-generated token from Secret Manager before configuring clients.
OPENCLAW_GATEWAY_TOKEN consistencyMust match gateway_tokenCriticalIf the token is rotated in Secret Manager without triggering a pod restart (e.g., via a rolling update), the in-memory token and the new Secret Manager value diverge — all client requests are rejected with 401 until pods are recycled.
enable_telegramfalseMediumSetting to true without providing both telegram_bot_token and telegram_webhook_secret creates Secret Manager secrets with empty values. The Telegram bot fails to authenticate with the API and cannot send or receive any messages.
telegram_bot_token""HighRequired when enable_telegram = true. An empty token causes the Telegram integration to fail with API authentication errors. All incoming Telegram messages are dropped.
telegram_webhook_secret""HighRequired when enable_telegram = true. Validates incoming Telegram webhook payloads in the router. An empty value disables signature verification, allowing any HTTP client to inject fake Telegram events into the agent. Generate with openssl rand -hex 32.
enable_slackfalseMediumSetting to true without slack_bot_token and slack_signing_secret creates empty Secret Manager secrets. The Slack integration fails all API calls silently and cannot verify incoming webhook signatures.
slack_bot_token""HighRequired when enable_slack = true. Must be a valid xoxb-... token. An invalid token causes all Slack API calls to fail with authentication errors.
slack_signing_secret""HighRequired when enable_slack = true. Verifies Slack request signatures in the router. An empty value disables verification, allowing any HTTP client to inject fake Slack events.
skills_repo_url"" (no skill sync)MediumWhen set, the skills repository is cloned at container startup. An unreachable or incorrect URL causes the container init to fail, putting the pod in CrashLoopBackOff. Only use a private repository URL when the GKE pod has network access to the git host.
skills_repo_ref"main"MediumA non-existent branch or tag causes the git clone to fail at startup, preventing the agent from loading skills. Always verify the ref exists before deploying.
stateful_pvc_enabledfalseMediumOpenClaw uses GCS for workspace storage. Enabling StatefulSet PVCs adds unused local disk, wastes resources, and can block pod rescheduling when GKE Autopilot cannot provision the requested disk type.
workload_type"Deployment"MediumSetting stateful_pvc_enabled = true auto-resolves to StatefulSet. Explicitly setting workload_type = "Deployment" alongside stateful_pvc_enabled = true fails at plan time.
min_instance_count1HighUnlike the CloudRun variant (which defaults to 0), GKE defaults to 1. Reducing to 0 means HPA scales down to zero; Telegram/Slack webhook events are dropped during the pod cold-start window (typically 30–60 seconds for GKE pod init + skills clone).
quota_memory_requests / quota_memory_limits"4Gi" / "8Gi"CriticalMust use binary unit suffixes (e.g., "4Gi", "8192Mi"). Bare integers are treated as bytes by Kubernetes, creating a near-zero memory quota that immediately blocks all pod scheduling.
enable_pod_disruption_budgettrueMediumDisabling PDB allows GKE node upgrades to evict all OpenClaw pods simultaneously, causing the agent to be unreachable during maintenance and dropping any in-flight conversations.
backup_schedule"0 2 * * *"MediumAn empty string disables automated workspace backups. OpenClaw stores conversation history, agent state, and workspace data in GCS. Without backups, an accidental bucket purge results in permanent loss of all agent state.
enable_vpc_scfalseHighRequires explicit organization_id. Without it, VPC Service Controls are silently skipped, leaving the Anthropic API key secret and other credentials without perimeter protection.
enable_iapfalseMediumEnabling IAP without iap_oauth_client_id, iap_oauth_client_secret, and iap_support_email results in partial IAP configuration that may block all traffic or leave the gateway unprotected. Webhook endpoints from Telegram/Slack must not be protected by IAP (they cannot authenticate with Google identity).
enable_auto_password_rotationfalseMediumIf a SQL backend is used, rotating the database password without restarting the pod causes the agent to use the old (now invalid) credentials until connections fail and the pod enters CrashLoopBackOff.

Outputs

OutputDescription
service_nameKubernetes service name.
namespaceKubernetes namespace.
service_cluster_ipClusterIP of the Kubernetes service.
stage_service_cluster_ipsMap of ClusterIPs for stage-specific services (Cloud Deploy).
service_external_ipExternal LoadBalancer IP when a static IP is reserved.
service_urlService URL (ClusterIP internal URL or custom domain).
storage_bucketsAll provisioned GCS buckets including the workspace bucket.
network_nameVPC network name.
network_existsWhether the VPC network exists.
regionsAvailable regions in the VPC.
nfs_server_ipNFS server internal IP (sensitive). Empty when enable_nfs = false.
nfs_mount_pathNFS mount path in containers.
nfs_share_pathNFS share path on server.
container_imageContainer image URI used for the deployment.
container_registryArtifact Registry repository name.
monitoring_enabledWhether Cloud Monitoring is configured.
monitoring_notification_channelsMonitoring notification channel names.
deployment_idUnique deployment identifier.
tenant_idTenant identifier.
resource_prefixResource naming prefix (app<name><tenant><id>).
project_idGCP project ID.
project_numberGCP project number.
initialization_jobsCreated initialization job names.
cron_jobsCreated cron job names.
statefulset_nameStatefulSet name (when workload_type = "StatefulSet").
nfs_setup_jobNFS setup job name.
db_import_jobDatabase import job name.
deployment_summarySummary of the deployment configuration.
cicd_enabledWhether CI/CD pipeline is enabled.
github_repository_urlGitHub repository URL connected for CI/CD.
github_repository_ownerGitHub repository owner/organization.
github_repository_nameGitHub repository name.
artifact_registry_repositoryArtifact Registry repository.
cloudbuild_trigger_nameCloud Build trigger name.
cloudbuild_trigger_idCloud Build trigger ID.
cicd_configurationCI/CD pipeline configuration details.
kubernetes_readytrue when the GKE cluster endpoint is available and all Kubernetes resources are deployed. false on the first apply of a new inline cluster — a second apply is required to complete deployment.

Platform-Specific Comparison

AspectOpenClaw CloudRunOpenClaw GKE
ComputeCloud Run v2 (serverless)GKE Autopilot (Kubernetes)
min_instance_count default0 (scale-to-zero)1 (always warm)
max_instance_count default13
CPU always allocatedtrue (hard-coded)Not applicable (Kubernetes always allocates)
Session affinityCloud Run-native IAP/IAPClientIP (Kubernetes Service sessionAffinity)
Service endpoint<service>.run.app URLhttp://<service>.<ns>.svc.cluster.local
IAP mechanismCloud Run native IAPKubernetes Gateway API + IAP
GCS Fuse driverCloud Run GCS Fuse extensionGCS Fuse CSI driver
Scaling mechanismCloud Run autoscalerKubernetes HPA
StatefulSet supportNot availableAvailable via workload_type = "StatefulSet"
kubernetes_ready outputNot applicableGating all Kubernetes resource creation