Skip to main content

Chroma on Google Cloud Run

This document provides a comprehensive reference for the modules/Chroma_CloudRun Terraform module. It covers architecture, IAM, configuration variables, Chroma-specific behaviours, and operational patterns for deploying Chroma on Google Cloud Run (v2).


1. Module Overview

Chroma is an AI-native open-source vector database purpose-built for embeddings and similarity search. Chroma CloudRun is a wrapper module built on top of App CloudRun. It uses App CloudRun for all GCP infrastructure provisioning and injects Chroma-specific application configuration, an optional authentication token, and storage configuration via Chroma Common.

Key Capabilities:

  • Compute: Cloud Run v2 (Gen2), 1 vCPU / 1 Gi by default. min_instance_count = 1 to avoid index-loading cold starts. max_instance_count = 1 — Chroma is a single-writer store.
  • Data Persistence: Cloud Storage bucket (<prefix>-data) mounted at /data via GCS FUSE. No Cloud SQL, no Redis.
  • Security: Optional token authentication via Secret Manager. A plan-time validation blocks public ingress (ingress_settings = "all") unless enable_auth_token = true. Inherits Cloud Armor, IAP, Binary Authorization, and VPC-SC from App CloudRun.
  • CI/CD: Cloud Build image pipeline by default; Cloud Deploy progressive delivery optional.
  • Reliability: Health probes target /api/v2/heartbeat.

Project & Application Identity

VariableGroupTypeDefaultDescription
project_id1stringGCP project ID. Required.
region1string'us-central1'GCP region fallback
tenant_deployment_id2string'demo'Short suffix appended to all resource names
support_users2list(string)[]Email recipients for monitoring alerts
resource_labels2map(string){}Labels applied to all provisioned resources
application_name3string'chroma'Base resource name. Do not change after initial deployment.
application_display_name3string'Chroma Vector Database'Human-readable name in the GCP Console
description3stringChroma descriptionCloud Run service description
application_version3string'latest'Chroma image tag

Wrapper architecture: Chroma CloudRun calls Chroma Common to build an application_config object containing Chroma-specific environment variables, probe configuration, and the storage volume definition. module_storage_buckets carries the <prefix>-data GCS bucket. scripts_dir is resolved to abspath("${module.chroma_app.path}/scripts") at apply time.


2. IAM & Access Control

Chroma CloudRun delegates all IAM provisioning to App CloudRun. The Cloud Run SA, Cloud Build SA, and IAP service agent role sets are identical to those in App CloudRun.

Auth token: Unlike most modules, Chroma Common can auto-generate a Chroma authentication token (when enable_auth_token = true). The token is stored in Secret Manager as <prefix>-auth-token and injected as CHROMA_SERVER_AUTH_CREDENTIALS. Applications calling Chroma must include this token in the Authorization: Bearer <token> header.

For the complete role tables and IAP details, see the App CloudRun documentation.


3. Core Service Configuration

A. Compute (Cloud Run)

Chroma CloudRun exposes cpu_limit and memory_limit as dedicated top-level variables. Chroma loads embedding indexes into RAM — size memory_limit according to your collection size and index type.

Single-instance constraint: max_instance_count = 1 is strongly recommended. Chroma is a single-writer store — multiple instances against the same GCS FUSE mount will corrupt collections. Scale vertically (increase CPU and memory) rather than horizontally.

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment
cpu_limit4'1000m'CPU per instance
memory_limit4'1Gi'Memory per instance. Increase for large embedding collections
min_instance_count41Keep at 1+ to avoid cold starts
max_instance_count41Keep at 1 — Chroma is single-writer
container_port48000Chroma REST API port
execution_environment4'gen2'Gen2 required for GCS FUSE
cpu_always_allocated4trueKeep CPU allocated to avoid index load delays between requests
timeout_seconds4300Max request duration (0–3600 s). Increase for large batch operations.
enable_cloudsql_volume4falseNot applicable — Chroma has no SQL database
enable_image_mirroring4trueMirror the Chroma image into Artifact Registry
container_protocol4'http1''http1' or 'h2c'
traffic_split4[]Canary/blue-green traffic allocation
max_revisions_to_retain47Maximum Cloud Run revisions to keep
service_annotations4{}Cloud Run service annotations
service_labels4{}Cloud Run service labels

Differences from App CloudRun defaults:

VariableApp CloudRunChroma CloudRunReason
container_port80808000Chroma's native REST API port
ingress_settings'all''internal'Vector databases should not be publicly exposed by default
enable_redistruefalse (hard-coded)Chroma has no Redis dependency
database_typeconfigurableNONE (fixed)Chroma manages its own embedded storage
min_instance_count01Avoid cold start index-loading delays

B. Storage (GCS FUSE)

Chroma requires persistent storage for its embedded SQLite database, HNSW index files, and collection metadata. Chroma Common automatically provisions a GCS bucket and mounts it at /data via GCS FUSE.

VariableGroupDefaultDescription
create_cloud_storage11trueSet false to skip GCS bucket creation
storage_buckets11[]Additional GCS buckets beyond the auto-provisioned data bucket
gcs_volumes11[]Additional GCS FUSE volume mounts
enable_nfs11falseMount Cloud Filestore NFS (requires gen2). Chroma uses GCS for storage; enable only for custom init jobs.
nfs_mount_path11'/mnt/nfs'NFS container mount path
manage_storage_kms_iam11falseCreate CMEK KMS key for storage
enable_artifact_registry_cmek11falseEnable CMEK for Artifact Registry

The auto-provisioned bucket uses these settings: storage_class = "STANDARD", versioning_enabled = false, public_access_prevention = "enforced".

C. Networking

VariableGroupDefaultDescription
ingress_settings5'internal'Recommended. 'all' requires enable_auth_token = true (plan-time validation).
vpc_egress_setting5'PRIVATE_RANGES_ONLY''PRIVATE_RANGES_ONLY' or 'ALL_TRAFFIC'

4. Authentication & Access Control

A. Chroma Auth Token

The primary Chroma-specific security control. When enable_auth_token = true:

  • A 32-character alphanumeric token is generated and stored in Secret Manager
  • Chroma starts with CHROMA_SERVER_AUTH_CREDENTIALS set to the token value
  • All API calls must include Authorization: Bearer <token> in the request header
  • The Python client usage: chromadb.HttpClient(host=..., headers={"Authorization": "Bearer <token>"})

Plan-time guard: validation.tf includes a precondition that prevents deploying with ingress_settings = "all" and enable_auth_token = false simultaneously. This blocks accidental public exposure of an unauthenticated Chroma instance.

VariableGroupDefaultDescription
enable_auth_token3falseGenerate and store authentication token in Secret Manager. Recommended for all non-internal deployments.

B. Identity-Aware Proxy (IAP)

When enable_iap = true, Cloud Run's native IAP integration is enabled. Google identity authentication is required before requests reach Chroma. Useful for web-based client access scenarios.

VariableGroupDefaultDescription
enable_iap5falseEnable IAP on the Cloud Run service
iap_authorized_users5[]Users/SAs granted access. Format: 'user:email'
iap_authorized_groups5[]Google Groups granted access

C. Cloud Armor

When enable_cloud_armor = true, a Global HTTPS Load Balancer with Cloud Armor WAF policy is provisioned in front of Cloud Run.

VariableGroupDefaultDescription
enable_cloud_armor10falseProvision Global HTTPS LB + Cloud Armor WAF
admin_ip_ranges10[]CIDR ranges exempted from WAF rules
application_domains10[]Custom domains for the HTTPS LB
enable_cdn10falseEnable Cloud CDN on the HTTPS LB

5. Observability & Health

A. Health Probes

Chroma exposes a single health endpoint — /api/v2/heartbeat — which returns HTTP 200 when the service is fully initialized. Both the startup and liveness probes are hard-coded to this endpoint by Chroma Common.

VariableGroupDefaultDescription
startup_probe14{ path="/api/v2/heartbeat", initial_delay=15, period=10, threshold=10 }Startup probe. Initial delay accounts for GCS FUSE mount and index loading.
liveness_probe14{ path="/api/v2/heartbeat", initial_delay=30, period=30, threshold=3 }Liveness probe. Container is restarted after 3 consecutive failures.
uptime_check_config14{ enabled=true, path="/api/v2/heartbeat" }Cloud Monitoring uptime check
alert_policies14[]Cloud Monitoring metric alert policies

B. Backup & Recovery

VariableGroupDefaultDescription
backup_schedule7'0 2 * * *'Cron expression (UTC) for automated backups
backup_retention_days77Days to retain backup files
enable_backup_import7falseTrigger a one-time restore on apply
backup_source7'gcs''gcs' (full GCS URI) or 'gdrive' (file ID)
backup_uri7""GCS URI or Drive file ID. Mapped to backup_file in App CloudRun.
backup_format7'tar'Backup format: sql, tar, gz, tgz, tar.gz, zip

6. CI/CD & Delivery

Identical to App CloudRun. When enable_cicd_trigger = true, a Cloud Build GitHub connection and push trigger are provisioned.

VariableGroupDefaultDescription
enable_cicd_trigger8falseProvision a Cloud Build GitHub trigger
github_repository_url8""Full HTTPS URL of the GitHub repository
github_token8""GitHub PAT. Sensitive. Required on first apply.
github_app_installation_id8""GitHub App installation ID
cicd_trigger_config8{ branch_pattern = "^main$" }Advanced trigger config
enable_cloud_deploy8falseProvision a Cloud Deploy pipeline
cloud_deploy_stages8[dev, staging, prod(approval)]Ordered promotion stages
enable_binary_authorization8falseEnforce image attestation

7. Platform-Managed Behaviours

BehaviourImplementationDetail
No databasedatabase_type = "NONE" fixed by Chroma CommonNo Cloud SQL instance is created
No Redisenable_redis = false hard-coded in main.tfChroma has no caching dependency
Telemetry disabledANONYMIZED_TELEMETRY=false always injectedPrivacy by default
Port fixedCHROMA_SERVER_HTTP_PORT=8000 always injectedMatches container_port = 8000
Public ingress blockedPlan-time validation in validation.tfingress_settings = "all" blocked unless enable_auth_token = true
GCS data bucket<prefix>-data provisioned by Chroma CommonMounted at /data via GCS FUSE
Health probe pathHard-coded to /api/v2/heartbeatChroma provides no configurable health path

8. Variable Reference

All user-configurable variables, sorted by UI group then order.

VariableGroupDefaultDescription
module_description0Chroma platform textPlatform metadata
module_documentation0docs URLDocumentation URL
module_dependency0['Services GCP']Required modules
module_services0GCP service listGCP services consumed
credit_cost050Deployment credit cost
require_credit_purchases0falseEnforce credit balance check
enable_purge0truePermit full resource deletion
public_access0falsePlatform catalogue visibility
shared_users0[]Users with access regardless of public_access
deployment_id0""Deployment ID suffix. Auto-generated when empty
resource_creator_identity0platform SATerraform service account
project_id1GCP project ID. Required.
region1'us-central1'GCP region fallback
tenant_deployment_id2'demo'Resource name suffix
support_users2[]Email addresses for monitoring alerts
resource_labels2{}Labels applied to all resources
application_name3'chroma'Base resource name. Do not change after initial deployment.
application_display_name3'Chroma Vector Database'Human-readable display name
description3Chroma descriptionService description
application_version3'latest'Chroma container image tag
enable_auth_token3falseGenerate auth token in Secret Manager. Recommended for public deployments.
deploy_application4trueSet false for infrastructure-only deployment
cpu_limit4'1000m'CPU per instance
memory_limit4'1Gi'Memory per instance
min_instance_count41Keep at 1+ to avoid cold starts
max_instance_count41Keep at 1 — Chroma is single-writer
container_port48000Chroma REST API port
execution_environment4'gen2'Gen2 required for GCS FUSE
cpu_always_allocated4trueAlways allocate CPU
timeout_seconds4300Max request duration
enable_image_mirroring4trueMirror image into Artifact Registry
container_protocol4'http1''http1' or 'h2c'
traffic_split4[]Canary/blue-green traffic allocation
max_revisions_to_retain47Maximum Cloud Run revisions
service_annotations4{}Cloud Run service annotations
service_labels4{}Cloud Run service labels
ingress_settings5'internal''internal', 'all' (requires auth token), or 'internal-and-cloud-load-balancing'
vpc_egress_setting5'PRIVATE_RANGES_ONLY'VPC egress mode
enable_iap5falseEnable IAP
iap_authorized_users5[]IAP-authorized users
iap_authorized_groups5[]IAP-authorized groups
environment_variables6{}Plain-text env vars
secret_environment_variables6{}Secret Manager references
secret_propagation_delay630Seconds to wait after secret creation
secret_rotation_period6'2592000s'Rotation reminder period
backup_schedule7'0 2 * * *'Automated backup cron schedule
backup_retention_days77Days to retain backups
enable_backup_import7falseTrigger one-time backup restore
backup_source7'gcs'Backup source: 'gcs' or 'gdrive'
backup_uri7""GCS URI or Drive file ID
backup_format7'tar'Backup file format
enable_cicd_trigger8falseCloud Build GitHub trigger
github_repository_url8""GitHub repository URL
github_token8""GitHub PAT. Sensitive.
github_app_installation_id8""GitHub App installation ID
cicd_trigger_config8{ branch_pattern = "^main$" }Advanced trigger config
enable_cloud_deploy8falseCloud Deploy pipeline
cloud_deploy_stages8[dev, staging, prod(approval)]Promotion stages
enable_binary_authorization8falseEnforce image attestation
additional_cloudrun_sa_roles8[]Extra IAM roles for Cloud Run SA
enable_cloud_armor10falseGlobal HTTPS LB + Cloud Armor WAF
admin_ip_ranges10[]CIDR ranges exempted from WAF
application_domains10[]Custom domains
enable_cdn10falseCloud CDN on HTTPS LB
max_images_to_retain107Max Artifact Registry images
delete_untagged_images10trueDelete untagged images
image_retention_days1030Image retention period
create_cloud_storage11trueCreate GCS buckets
storage_buckets11[]Additional GCS buckets
enable_nfs11falseMount Cloud Filestore NFS
nfs_mount_path11'/mnt/nfs'NFS container mount path
gcs_volumes11[]Additional GCS FUSE volumes
manage_storage_kms_iam11falseCMEK for storage
enable_artifact_registry_cmek11falseCMEK for Artifact Registry
initialization_jobs13[]One-shot Cloud Run Jobs. No default job.
cron_jobs13[]Recurring scheduled jobs
startup_probe14{ path="/api/v2/heartbeat", initial_delay=15, ... }Startup probe
liveness_probe14{ path="/api/v2/heartbeat", initial_delay=30, ... }Liveness probe
uptime_check_config14{ enabled=true, path="/api/v2/heartbeat" }Uptime check
alert_policies14[]Metric alert policies
enable_vpc_sc23falseVPC Service Controls perimeter
vpc_cidr_ranges23[]VPC CIDR ranges for VPC-SC
vpc_sc_dry_run23trueLog violations without blocking
organization_id23""GCP Organization ID (required for VPC-SC)
enable_audit_logging23falseCloud Audit Logs

9. Outputs

OutputDescription
service_urlCloud Run service HTTPS URL
service_nameCloud Run service name
service_locationGCP region where the service is deployed
project_idGCP project ID
deployment_idDeployment ID suffix
storage_bucketsProvisioned GCS bucket list
container_imageContainer image used for the deployment

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
enable_auth_tokenfalseCriticalWithout an authentication token, any caller who can reach the Cloud Run service can read, write, or delete any Chroma collection. Set to true for any deployment reachable outside the VPC. The generated token is stored in Secret Manager and must be provided in the Authorization: Bearer <token> header by all clients.
ingress_settings"internal"HighDefault is internal (VPC-only). Changing to "all" exposes the Chroma API to the public internet with no authentication unless enable_auth_token = true. Never set to "all" without authentication enabled.
enable_nfsfalseHighWithout NFS, Chroma stores its collection data inside the container filesystem. On Cloud Run, this storage is ephemeral — a new revision deployment or instance restart erases all collections. Enable NFS (requires execution_environment = "gen2") or GCS Fuse (enable_gcs_storage_volume = true) for persistence.
execution_environment"gen2"HighNFS mounts require Gen2. If enable_nfs = true is set while execution_environment = "gen1", the Cloud Run deployment fails.
memory_limit"1Gi"HighChroma loads HNSW indexes entirely into memory. Each 1M 1536-dimension float32 vectors requires approximately 6 Gi of RAM. The default 1Gi is only suitable for very small collections (< 100K vectors). Underprovisioning causes OOM kills under query load.
cpu_always_allocatedtrueMediumChroma must respond to health checks and background index rebuilds even with no active requests. Setting to false causes CPU throttling between requests, slowing index operations and potentially causing health check timeouts.
min_instance_count1MediumScale-to-zero (0) causes cold starts during which in-memory indexes must be reloaded from GCS. For latency-sensitive applications, keep at 1.
max_instance_count1HighMultiple Chroma Cloud Run instances do not share state. With max_instance_count > 1 and GCS Fuse storage, concurrent writes from different instances can corrupt the collection. Chroma Cloud Run is single-instance by design; use Chroma GKE with StatefulSet for multi-replica production deployments.
container_port8000CriticalChroma listens on port 8000 by default. Changing this requires a matching CHROMA_SERVER_HTTP_PORT environment variable to be set, otherwise the container starts but Cloud Run health checks fail and the revision is never promoted.
vpc_egress_setting"PRIVATE_RANGES_ONLY"LowPRIVATE_RANGES_ONLY is correct when all dependencies are on the VPC. Only change to ALL_TRAFFIC if Chroma must reach public endpoints directly (e.g., embedding model APIs).
timeout_seconds300MediumLarge similarity searches over millions of vectors can take several seconds. Setting too low causes Cloud Run to return 504 to clients during expensive queries. Increase to 600 for large collection workloads.
enable_gcs_storage_volumetrue (in Common)HighGCS Fuse is the primary persistence mechanism for Cloud Run Chroma deployments. Disabling it means collections are lost on instance restart. Leave enabled unless NFS is used instead.
application_version"latest"MediumUsing latest means the deployed image can change on rebuild without an explicit version bump. Pin to a specific Chroma version tag for reproducible production deployments.
backup_schedule(varies by module)MediumChroma data is stored in GCS/NFS; regular snapshots of the GCS bucket are the primary backup mechanism. Ensure GCS object versioning or the module's backup job is configured for disaster recovery.
enable_iapfalseHighWhen ingress_settings = "all" and enable_auth_token = false, the service is fully open. Enable IAP as an additional authentication layer for user-facing deployments.
secret_propagation_delay30MediumIf the auth token secret is created but not yet propagated when Cloud Run reads it, the container may start with an empty auth token. Increase to 60 in large projects.
enable_cloudsql_volumefalseLowChroma does not use a relational database. This variable should remain false; enabling it injects a Cloud SQL Auth Proxy sidecar that serves no purpose and consumes container resources.
resource_labels{}LowWithout labels, cost attribution and resource filtering in GCP Console is difficult. Add at minimum env and service labels for production.

10. Destroying Resources

When enable_purge = true, tofu destroy removes all module-managed resources. After Cloud Run service deletion, GCP may hold serverless IPv4 addresses on the VPC subnet for 20–30 minutes before release. If the destroy attempt fails with a subnet deletion error, wait and re-run:

tofu destroy