Skip to main content

Chroma on Google Kubernetes Engine

This document provides a comprehensive reference for the modules/Chroma_GKE Terraform module. It covers architecture, IAM, configuration variables, Chroma-specific behaviours, and operational patterns for deploying Chroma on GKE Autopilot.


1. Module Overview

Chroma GKE is a wrapper module built on top of App GKE. It deploys Chroma — the AI-native open-source vector database — on GKE Autopilot with production-grade StatefulSet persistence, GCS FUSE storage, optional token authentication, Workload Identity, and horizontal auto-scaling.

Key Capabilities:

  • Compute: GKE Autopilot, 1 vCPU / 1 Gi by default. StatefulSet or Deployment workload type.
  • Data Persistence: StatefulSet PVC (recommended for production) or GCS FUSE-mounted Cloud Storage bucket. No Cloud SQL, no Redis.
  • Security: Optional auth token via Secret Manager injected as CHROMA_SERVER_AUTH_CREDENTIALS. Inherits Cloud Armor, IAP, and VPC-SC from App GKE.
  • CI/CD: Cloud Build image pipeline by default; Cloud Deploy progressive delivery optional.
  • Reliability: Health probes target /api/v2/heartbeat. PodDisruptionBudget enabled by default.

Project & Application Identity

VariableGroupTypeDefaultDescription
project_id1stringGCP project ID. Required.
region1string'us-central1'GCP region fallback
tenant_deployment_id2string'demo'Short suffix appended to resource names
support_users2list(string)[]Email recipients for monitoring alerts
resource_labels2map(string){}Labels applied to all resources
application_name3string'chroma'Base resource name. Do not change after initial deployment.
application_display_name3string'Chroma Vector Database'Human-readable name in the GCP Console
description3stringChroma descriptionWorkload description
application_version3string'latest'Chroma image tag

2. IAM & Access Control

Chroma GKE delegates all IAM provisioning to App GKE. Workload Identity is used — the Kubernetes service account is bound to a GCP service account with the minimum required roles (GCS read/write for the data bucket, Secret Manager accessor for the auth token).

Auth token: When enable_auth_token = true, Chroma Common generates a token and stores it in Secret Manager as <prefix>-auth-token. The token is injected as CHROMA_SERVER_AUTH_CREDENTIALS.


3. Core Service Configuration

A. Compute (GKE)

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment
cpu_limit4'1000m'CPU limit per pod
memory_limit4'1Gi'Memory limit per pod. Increase for large collections.
container_resources4{ cpu_limit="1000m", memory_limit="1Gi" }Structured resource spec. Provides cpu_request, mem_request, ephemeral_storage_* fields.
min_instance_count41Minimum pod replicas
max_instance_count41Maximum pod replicas. Keep at 1 for single-writer safety.
timeout_seconds4300Request timeout (0–3600 s)
enable_image_mirroring4trueMirror Chroma image to Artifact Registry
enable_vertical_pod_autoscaling4falseEnable VPA (disables CPU/memory HPA when true)

B. Kubernetes Workload

VariableGroupDefaultDescription
gke_cluster_name6""Target GKE cluster. Auto-discovered when empty.
gke_cluster_selection_mode6'primary''explicit', 'round-robin', or 'primary'
namespace_name6""Kubernetes namespace. Auto-generated when empty.
workload_type6null'Deployment' or 'StatefulSet'. Auto-resolves to StatefulSet when stateful_pvc_enabled = true.
service_type6'ClusterIP''ClusterIP' (recommended for vector databases), 'LoadBalancer', or 'NodePort'
session_affinity6'None''None' or 'ClientIP'
termination_grace_period_seconds660Grace period for Chroma to flush writes
enable_network_segmentation6falseEnable Kubernetes NetworkPolicies
configure_service_mesh6falseEnable Istio service mesh injection
deployment_timeout61800Seconds Terraform waits for rollout

C. StatefulSet Persistence

For production deployments, stateful_pvc_enabled = true is recommended over GCS FUSE. PVC-backed storage avoids GCS FUSE I/O overhead for large collections and eliminates GCS API latency for index reads.

VariableGroupDefaultDescription
stateful_pvc_enabled7nullEnable PVC. Recommended for production. Auto-selects StatefulSet.
stateful_pvc_size7'20Gi'Per-pod PVC size. Size to hold all collections plus index overhead.
stateful_pvc_mount_path7'/data'Container path for the PVC
stateful_pvc_storage_class7'standard-rwo''standard-rwo' (Balanced PD) or 'premium-rwo' (higher IOPS)
stateful_headless_service7nullCreate a headless service for stable network identities
stateful_pod_management_policy7null'OrderedReady' ensures safe sequential restarts
stateful_update_strategy7null'RollingUpdate' for zero-downtime updates
stateful_fs_group71000GID for PVC write access. Set to 0 to leave unset.

PVC vs GCS FUSE double-mount prevention: When stateful_pvc_enabled = true, the wrapper passes enable_gcs_storage_volume = false to Chroma Common, which prevents the <prefix>-data GCS bucket from being mounted at /data alongside the PVC.

D. Storage (GCS FUSE)

When stateful_pvc_enabled is not set, Chroma data is persisted via GCS FUSE:

VariableGroupDefaultDescription
create_cloud_storage14trueProvision GCS buckets
storage_buckets14[]Additional GCS buckets
gcs_volumes14[]Additional GCS FUSE volumes
manage_storage_kms_iam14falseCMEK for storage
enable_artifact_registry_cmek14falseCMEK for Artifact Registry

4. Authentication & Access Control

A. Chroma Auth Token

VariableGroupDefaultDescription
enable_auth_token3falseGenerate auth token in Secret Manager. Injected as CHROMA_SERVER_AUTH_CREDENTIALS.

B. Identity-Aware Proxy (IAP)

IAP via the Kubernetes Gateway API. Requires enable_custom_domain or enable_cdn.

VariableGroupDefaultDescription
enable_iap20falseEnable IAP via Kubernetes Gateway
iap_authorized_users20[]IAP-authorized users
iap_authorized_groups20[]IAP-authorized groups
iap_oauth_client_id20""OAuth client ID. Sensitive.
iap_oauth_client_secret20""OAuth client secret. Sensitive.

C. Cloud Armor

VariableGroupDefaultDescription
enable_cloud_armor21falseAttach Cloud Armor to GKE Ingress
admin_ip_ranges21[]CIDR ranges exempted from WAF rules
cloud_armor_policy_name21'default-waf-policy'Cloud Armor policy name
enable_cdn21falseEnable Cloud CDN via GCPBackendPolicy

5. Observability & Health

A. Health Probes

VariableGroupDefaultDescription
startup_probe10{ path="/api/v2/heartbeat", initial_delay=15, period=10, threshold=10 }Startup probe
liveness_probe10{ path="/api/v2/heartbeat", initial_delay=30, period=30, threshold=3 }Liveness probe
startup_probe_config10{ path="/api/v2/heartbeat" }Alternative startup probe
health_check_config10{ path="/api/v2/heartbeat" }Alternative liveness probe
uptime_check_config10{ enabled=true, path="/api/v2/heartbeat" }Cloud Monitoring uptime check
alert_policies10[]Metric alert policies

B. Pod Disruption Budget & Resource Quotas

VariableGroupDefaultDescription
enable_pod_disruption_budget9trueCreate a PodDisruptionBudget
pdb_min_available9'1'Minimum pods available during disruptions
enable_resource_quota8falseCreate a Kubernetes ResourceQuota
quota_memory_requests8""Memory requests quota (must use binary suffix, e.g., '4Gi')
quota_memory_limits8""Memory limits quota (must use binary suffix)

6. Platform-Managed Behaviours

BehaviourImplementationDetail
No SQL databasedatabase_type = "NONE" fixed by Chroma CommonNo Cloud SQL resources created
No RedisNot usedChroma has no caching dependency
Fixed env varsAlways injected by Chroma CommonANONYMIZED_TELEMETRY=false, CHROMA_SERVER_HTTP_PORT=8000
Health probe pathHard-coded to /api/v2/heartbeatChroma provides no configurable health path
StatefulSet auto-selectstateful_pvc_enabled = trueAutomatically resolves workload_type to "StatefulSet"
PVC prevents GCS double-mountenable_gcs_storage_volume = false passed to Chroma CommonPrevents simultaneous PVC and GCS FUSE at /data

7. Variable Reference

VariableGroupDefaultDescription
project_id1GCP project ID. Required.
region1'us-central1'Region fallback
tenant_deployment_id2'demo'Resource name suffix
support_users2[]Monitoring alert recipients
resource_labels2{}Resource labels
application_name3'chroma'Base resource name
application_display_name3'Chroma Vector Database'Display name
description3Chroma descriptionWorkload description
application_version3'latest'Image tag
enable_auth_token3falseGenerate auth token
deploy_application4trueDeploy workload
cpu_limit4'1000m'CPU limit
memory_limit4'1Gi'Memory limit
min_instance_count41Minimum replicas
max_instance_count41Maximum replicas
timeout_seconds4300Request timeout
enable_image_mirroring4trueMirror to Artifact Registry
environment_variables5{}Plain-text env vars
secret_environment_variables5{}Secret Manager references
secret_propagation_delay530Post-creation wait
secret_rotation_period5'2592000s'Rotation period
gke_cluster_name6""GKE cluster name
namespace_name6""Kubernetes namespace
workload_type6null'Deployment' or 'StatefulSet'
service_type6'ClusterIP'Kubernetes Service type
termination_grace_period_seconds660Grace period
stateful_pvc_enabled7nullEnable StatefulSet PVC
stateful_pvc_size7'20Gi'PVC size
stateful_pvc_mount_path7'/data'PVC mount path
stateful_pvc_storage_class7'standard-rwo'StorageClass
stateful_pod_management_policy7nullPod management policy
stateful_update_strategy7nullUpdate strategy
stateful_fs_group71000fsGroup GID
enable_resource_quota8falseResourceQuota
enable_pod_disruption_budget9truePodDisruptionBudget
pdb_min_available9'1'Min pods during disruptions
startup_probe10{ path="/api/v2/heartbeat" }Startup probe
liveness_probe10{ path="/api/v2/heartbeat" }Liveness probe
uptime_check_config10{ enabled=true }Uptime check
initialization_jobs11[]Init jobs
cron_jobs11[]Scheduled jobs
enable_cicd_trigger12falseCloud Build trigger
github_repository_url12""GitHub URL
enable_cloud_deploy12falseCloud Deploy pipeline
enable_nfs13falseCloud Filestore NFS
create_cloud_storage14trueGCS buckets
storage_buckets14[]Additional buckets
gcs_volumes14[]GCS FUSE volumes
backup_schedule17'0 2 * * *'Backup cron
backup_retention_days177Backup retention
enable_backup_import17falseOne-time restore
enable_custom_domain19falseCustom domain
application_domains19[]Domain names
reserve_static_ip19falseReserve static IP
enable_iap20falseIAP via Gateway
enable_cloud_armor21falseCloud Armor
enable_vpc_sc22falseVPC Service Controls
organization_id22""Org ID for VPC-SC
enable_audit_logging22falseCloud Audit Logs

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
enable_auth_tokenfalseCriticalWithout an auth token, any caller who can reach the Chroma service endpoint can read, write, or delete any collection. Set to true for any internet-facing or shared cluster deployment. The generated token is stored in Secret Manager and must be passed as Authorization: Bearer <token>.
stateful_pvc_enablednullHighWithout a PVC, Chroma stores collection data in the ephemeral container filesystem. A pod restart or rolling update erases all collections and their vectors. Set stateful_pvc_enabled = true for any persistent production deployment.
stateful_pvc_size"20Gi"HighUndersized PVCs fill up as vector collections grow. A full PVC causes Chroma to crash with disk-full errors, making all collections unavailable. HNSW indexes for 1M vectors at 1536 dimensions require approximately 6 Gi. Size generously — PVC capacity cannot be reduced after provisioning.
workload_typenullHighDefaults to Deployment (stateless). Setting stateful_pvc_enabled = true without an explicit workload_type automatically resolves to StatefulSet. Explicitly setting workload_type = "Deployment" alongside stateful_pvc_enabled = true fails at plan time.
stateful_pvc_storage_class"standard-rwo"MediumBalanced PD (standard-rwo) provides adequate IOPS for most workloads. Large-scale ANN searches (HNSW with ef_search > 100) benefit from premium-rwo (SSD). Changing storage class after PVC creation requires manual data migration.
memory_limit"1Gi"HighChroma loads full HNSW indexes into memory. The default 1Gi supports only very small collections. For production workloads, provision at least 4Gi; large embeddings (> 1M vectors) may require 16Gi or more. OOM kills terminate the pod, dropping all in-flight queries.
cpu_limit"1000m"MediumHNSW index builds and similarity searches are CPU-bound. Under high query concurrency, CPU throttling degrades p99 latency significantly. Increase to 2000m4000m for production.
min_instance_count1MediumScale-to-zero (0) on GKE causes the pod to be deleted. After scaling back up, Chroma must reload the HNSW index from the PVC (or GCS), which can take tens of seconds for large collections. Keep at 1 for latency-sensitive workloads.
max_instance_count1HighMultiple Chroma replicas sharing a single PVC are not supported. Chroma does not have a distributed lock on its storage. Using max_instance_count > 1 with a single PVC causes concurrent write corruption. For horizontal scaling, use a Chroma cluster deployment with separate PVCs per pod (one collection set per replica).
enable_gcs_storage_volume (Common)trueHighIf GCS Fuse is the storage backend (no PVC) and it is disabled, all data is stored in the ephemeral container layer and lost on restart. Do not disable unless PVC persistence is configured.
quota_memory_requests""CriticalIf enable_resource_quota = true and this value is set without binary suffixes (e.g. "4" instead of "4Gi"), Kubernetes treats it as bytes, blocking all pod scheduling in the namespace. Always use Gi or Mi. Note: in Chroma GKE this variable is accepted but not forwarded; verify in App GKE if enabled.
stateful_pvc_mount_path"/data"CriticalChroma defaults to /data for its storage directory. If the mount path does not match the CHROMA_SERVER_PERSIST_DIRECTORY environment variable, Chroma will use the ephemeral in-container path, silently losing data on restart.
application_version"latest"MediumUsing "latest" makes deployments non-reproducible. Chroma's data format has changed between major versions; upgrading across incompatible versions can make existing collections unreadable. Pin to a specific version tag in production.
enable_nfsfalseLowNFS is not recommended for primary Chroma storage on GKE — prefer PVCs for better IOPS and exclusive access semantics. NFS is useful for shared read-only data.
backup_schedule"0 2 * * *"MediumRegular backups of the PVC (via Kubernetes VolumeSnapshot or GCS export) are essential. The default daily schedule may not meet aggressive RPO targets.
enable_iapfalseHighWithout IAP, the GKE LoadBalancer endpoint is accessible to any caller (depending on firewall rules). Enable IAP with iap_authorized_users/iap_authorized_groups for user-facing deployments.
iap_oauth_client_id / iap_oauth_client_secret""HighSetting enable_iap = true without valid OAuth credentials causes the IAP configuration to fail silently or block all traffic. Obtain these from the GCP Console OAuth consent screen before enabling IAP.