Skip to main content

Chroma on Cloud Run — Lab Guide

📖 Configuration Guide

Overview

Estimated time: 1–2 hours

Chroma is an AI-native open-source vector database for embeddings and similarity search. This lab deploys Chroma on Google Cloud Run backed by a Cloud Storage bucket for persistent collection storage. Cloud Run provides serverless hosting with scale-to-zero capability.

What the Module Automates

  • Cloud Run v2 (Gen2) service with GCS FUSE volume mount
  • Cloud Storage bucket for Chroma collection data (/data)
  • Artifact Registry repository and Cloud Build image pipeline
  • Secret Manager secret for authentication token (when enabled)
  • Serverless VPC Access / Direct VPC Egress
  • Cloud Run IAM and service account bindings
  • Cloud Monitoring uptime checks (/api/v2/heartbeat)
  • Automated backup Cloud Run Job

What You Do Manually

  • Note the Cloud Run service URL from the RAD UI deployment panel
  • Retrieve the auth token from Secret Manager (if enabled)
  • Connect to Chroma using the Python client or REST API
  • Create collections and upsert embeddings
  • Run similarity searches
  • Review logs in Cloud Logging

CLI and REST API Overview

ToolPurpose
gcloudAccess secrets, inspect Cloud Run services, view logs
curlDirect Chroma REST API calls

Install: Google Cloud SDK


Prerequisites

  1. A GCP project with billing enabled.
  2. The Services GCP module deployed in the same project (provides VPC and networking).
  3. The following APIs enabled (Services GCP handles this):
    • run.googleapis.com
    • secretmanager.googleapis.com
    • artifactregistry.googleapis.com
    • cloudbuild.googleapis.com
    • storage.googleapis.com
  4. gcloud authenticated: gcloud auth application-default login
  5. Python 3.9+ and pip for the Chroma client steps.

Phase 1 — Deploy Infrastructure [AUTOMATED]

Step 1.1 — Configure Variables

VariableRequiredDefaultDescription
project_idYesGCP project ID
tenant_deployment_idNo"demo"Short deployment identifier
deployment_idNo""Auto-generated suffix
regionNo"us-central1"GCP region
application_nameNo"chroma"Base name for Cloud Run service and resources
application_versionNo"latest"Chroma Docker image tag
deploy_applicationNotrueSet false to provision infrastructure without deploying
cpu_limitNo"1000m"CPU per Cloud Run instance
memory_limitNo"1Gi"Memory per Cloud Run instance
min_instance_countNo1Min instances (1 avoids cold starts)
max_instance_countNo1Max instances (keep at 1 — single-writer)
enable_auth_tokenNofalseGenerate and store auth token in Secret Manager
ingress_settingsNo"internal""internal" (VPC only) or "all" (requires auth token)
backup_scheduleNo"0 2 * * *"Cron schedule for automated backups
backup_retention_daysNo7Days to retain backup files
support_usersNo[]Email addresses for monitoring alerts

Step 1.2 — Initiate Deployment

Deployment is initiated from the RAD UI. Fill in the variables form and click Deploy.

Approximate deployment durations:

PhaseDuration
Artifact Registry image build (Cloud Build)5–8 min
Cloud Run service deployment2–3 min
GCS bucket creation1 min
Total8–12 min

Step 1.3 — Record Outputs

OutputDescription
service_urlHTTPS URL of the Chroma Cloud Run service
service_nameCloud Run service name
deployment_idUnique deployment identifier
storage_bucketsChroma data bucket name

Set shell variables for use in later steps:

export PROJECT="your-gcp-project-id"
export REGION="us-central1"

# Discover the Cloud Run service
export SERVICE=$(gcloud run services list \
--project=${PROJECT} \
--region=${REGION} \
--format="value(metadata.name)" \
--filter="metadata.name~chroma" \
--limit=1)
export SERVICE_URL=$(gcloud run services describe ${SERVICE} \
--project=${PROJECT} \
--region=${REGION} \
--format="value(status.url)")

echo "Chroma URL: ${SERVICE_URL}"

Phase 2 — Verify Deployment [MANUAL]

Step 2.1 — Check the Heartbeat Endpoint

curl -s "${SERVICE_URL}/api/v2/heartbeat"

Expected result:

{"nanosecond heartbeat": 1234567890}

If the response is empty or returns a 503, the Cloud Run instance may still be starting. Wait 30 seconds and retry.

gcloud equivalent — inspect Cloud Run service:

gcloud run services describe ${SERVICE} \
--region=${REGION} \
--project=${PROJECT} \
--format="value(status.conditions)"

REST API equivalent:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://run.googleapis.com/v2/projects/${PROJECT}/locations/${REGION}/services/${SERVICE}"

Step 2.2 — Check Chroma API Version

curl -s "${SERVICE_URL}/api/v2/version"

Expected result: A JSON object with the Chroma server version string.

Step 2.3 — Retrieve Auth Token (if enabled)

If enable_auth_token = true, retrieve the token before making further API calls:

# Find the auth token secret
export AUTH_SECRET=$(gcloud secrets list \
--project=${PROJECT} \
--filter="name~chroma" \
--filter="name~auth-token" \
--format="value(name)" \
--limit=1)

# Retrieve the token value
export CHROMA_TOKEN=$(gcloud secrets versions access latest \
--secret="${AUTH_SECRET}" \
--project=${PROJECT})

echo "Auth token retrieved: ${#CHROMA_TOKEN} characters"

REST API equivalent:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://secretmanager.googleapis.com/v1/projects/${PROJECT}/secrets/${AUTH_SECRET}/versions/latest:access" \
| python3 -c "import sys,json,base64; print(base64.b64decode(json.load(sys.stdin)['payload']['data']).decode())"

Phase 3 — Create Collections [MANUAL]

Step 3.1 — Install the Chroma Python Client

pip install chromadb

Step 3.2 — Connect to Chroma

import chromadb

# Without auth token
client = chromadb.HttpClient(
host="<SERVICE_URL without https://>",
port=443,
ssl=True
)

# With auth token
client = chromadb.HttpClient(
host="<SERVICE_URL without https://>",
port=443,
ssl=True,
headers={"Authorization": "Bearer <CHROMA_TOKEN>"}
)

Test the connection:

client.heartbeat()

Expected result: A nanosecond timestamp integer.

Step 3.3 — Create a Collection

collection = client.create_collection(
name="my_documents",
metadata={"description": "Document embeddings for RAG"}
)
print(f"Collection created: {collection.name}")

Expected result: The collection is created and its name is printed.

gcloud — verify the collection persists in GCS:

export DATA_BUCKET=$(gcloud storage buckets list \
--project=${PROJECT} \
--filter="name~chroma" \
--format="value(name)" \
--limit=1)
gcloud storage ls gs://${DATA_BUCKET}/

Expected result: Chroma's SQLite database file and directory structure appear in the bucket.

Step 3.4 — Upsert Documents

# Upsert documents with embeddings
collection.upsert(
ids=["doc1", "doc2", "doc3"],
documents=[
"Chroma is an open-source vector database for AI applications.",
"Google Cloud Run provides serverless container hosting.",
"RAG pipelines combine vector search with language models."
],
metadatas=[
{"source": "chroma_docs", "category": "database"},
{"source": "gcp_docs", "category": "cloud"},
{"source": "ai_guide", "category": "ml"}
]
)
print("Documents upserted successfully")

Expected result: Documents are stored in the collection. Chroma auto-embeds using its default embedding function.

REST API equivalent:

curl -s -X POST "${SERVICE_URL}/api/v2/collections/my_documents/upsert" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${CHROMA_TOKEN}" \
-d '{
"ids": ["doc1", "doc2"],
"documents": ["Chroma vector database", "Cloud Run serverless"],
"metadatas": [{"source": "test"}, {"source": "test"}]
}'

Phase 4 — Run Similarity Searches [MANUAL]

Step 4.1 — Query the Collection

results = collection.query(
query_texts=["What is a vector database?"],
n_results=2,
include=["documents", "distances", "metadatas"]
)

for i, (doc, dist) in enumerate(zip(results["documents"][0], results["distances"][0])):
print(f"Result {i+1}: {doc[:60]}... (distance: {dist:.4f})")

Expected result: The two most semantically similar documents are returned with their distances. The Chroma document about the vector database should rank highest.

Step 4.2 — Filter by Metadata

results = collection.query(
query_texts=["cloud infrastructure"],
n_results=2,
where={"category": "cloud"},
include=["documents", "metadatas", "distances"]
)
print(results)

Expected result: Only documents with category = "cloud" in their metadata are returned.

Step 4.3 — List Collections

collections = client.list_collections()
for c in collections:
print(f" Collection: {c.name}")

REST API equivalent:

curl -s "${SERVICE_URL}/api/v2/collections" \
-H "Authorization: Bearer ${CHROMA_TOKEN}" | python3 -m json.tool

Expected result: my_documents appears in the collection list.


Phase 5 — Explore Cloud Logging [MANUAL]

Step 5.1 — View Chroma Application Logs

Navigate to Logging > Logs Explorer in the Cloud Console.

resource.type="cloud_run_revision"
resource.labels.service_name="${SERVICE}"
resource.labels.location="${REGION}"

gcloud equivalent:

gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.service_name="'${SERVICE}'"' \
--project=${PROJECT} \
--limit=50 \
--format="table(timestamp, textPayload)"

Expected result: Chroma startup logs appear, including the port binding line and request logs for your API calls.

Step 5.2 — Review Uptime Check

Navigate to Monitoring > Uptime checks in the Cloud Console.

Expected result: The uptime check targeting /api/v2/heartbeat shows Passing from multiple global locations.


Phase 6 — Cloud Run Features [MANUAL]

Step 6.1 — Inspect the Cloud Run Service

gcloud run services describe ${SERVICE} \
--region=${REGION} \
--project=${PROJECT}

Expected result: Service status shows Ready. Container spec shows the GCS FUSE volume mount at /data, CPU/memory limits, and environment variables.

Step 6.2 — View Revisions

gcloud run revisions list \
--service=${SERVICE} \
--region=${REGION} \
--project=${PROJECT}

Expected result: The active revision is listed with 100% traffic allocation.

Step 6.3 — Check Instance Count

gcloud monitoring time-series list \
--filter='metric.type="run.googleapis.com/container/instance_count" AND resource.labels.service_name="'${SERVICE}'"' \
--project=${PROJECT}

Expected result: With min_instance_count = 1, at least one instance is always running. No cold starts.


Phase 7 — Undeploy [AUTOMATED]

When you are finished, return to the RAD UI, navigate to your deployment, and click Undeploy to remove all resources.

Approximate undeploy duration: 5–10 minutes.

Warning: This permanently deletes the Chroma data bucket and all stored collections. Export collection data before undeploying if needed.


Summary

ActionPhaseAutomated
Cloud Run service provisioning1Yes
GCS data bucket creation1Yes
Secret Manager auth token1Yes (if enabled)
Container image build (Cloud Build)1Yes
IAM and service account bindings1Yes
Note service URL from RAD UI2No
Verify Chroma heartbeat2No
Retrieve auth token2No
Install Python client3No
Create collections3No
Upsert documents3No
Run similarity searches4No
Review Cloud Logging5No
Inspect Cloud Run service6No
Undeploy infrastructure7Yes