Skip to main content

Qdrant on Google Cloud Run

This document provides a comprehensive reference for the modules/Qdrant_CloudRun Terraform module. It covers architecture, IAM, configuration variables, Qdrant-specific behaviours, and operational patterns for deploying Qdrant on Google Cloud Run (v2).


1. Module Overview

Qdrant is a high-performance vector database and similarity search engine built in Rust. Qdrant CloudRun is a wrapper module built on top of App CloudRun. It uses App CloudRun for all GCP infrastructure provisioning and injects Qdrant-specific application configuration, an optional API key, and storage configuration via Qdrant Common.

Key Capabilities:

  • Compute: Cloud Run v2 (Gen2), 1 vCPU / 1 Gi by default. min_instance_count = 1 to avoid HNSW index-loading cold starts. max_instance_count = 1 — Qdrant is a single-writer store.
  • Data Persistence: Cloud Storage bucket (<prefix>-storage) mounted at /qdrant/storage via GCS FUSE. No Cloud SQL, no Redis.
  • Security: Optional API key via Secret Manager. A plan-time validation blocks public ingress (ingress_settings = "all") unless enable_api_key = true. Inherits Cloud Armor, IAP, Binary Authorization, and VPC-SC from App CloudRun.
  • CI/CD: Cloud Build image pipeline by default; Cloud Deploy progressive delivery optional.
  • Reliability: Startup probe targets /readyz; liveness probe targets /livez. Separate endpoints prevent spurious restarts during collection loading.
  • gRPC: Qdrant supports gRPC on port 6334. Cloud Run does not multiplex two ports, so gRPC is disabled by default. Use container_protocol = "h2c" with a gRPC client if needed.

Project & Application Identity

VariableGroupTypeDefaultDescription
project_id1stringGCP project ID. Required.
region1string'us-central1'GCP region fallback
tenant_deployment_id2string'demo'Short suffix appended to all resource names
support_users2list(string)[]Email recipients for monitoring alerts
resource_labels2map(string){}Labels applied to all provisioned resources
application_name3string'qdrant'Base resource name. Do not change after initial deployment.
application_display_name3string'Qdrant Vector Database'Human-readable name in the GCP Console
description3stringQdrant descriptionCloud Run service description
application_version3string'latest'Qdrant image tag (e.g., 'v1.9.0')

Wrapper architecture: Qdrant CloudRun calls Qdrant Common to build an application_config object containing Qdrant-specific environment variables, probe configuration, and the storage volume definition. module_storage_buckets carries the <prefix>-storage GCS bucket. scripts_dir is resolved to abspath("${module.qdrant_app.path}/scripts") at apply time.


2. IAM & Access Control

Qdrant CloudRun delegates all IAM provisioning to App CloudRun. The Cloud Run SA, Cloud Build SA, and IAP service agent role sets are identical to those in App CloudRun.

API key: When enable_api_key = true, Qdrant Common generates a 32-character API key and stores it in Secret Manager as <prefix>-api-key. It is injected as QDRANT__SERVICE__API_KEY. All REST and gRPC calls must include api-key: <key> in the request header.

For the complete role tables and IAP details, see the App CloudRun documentation.


3. Core Service Configuration

A. Compute (Cloud Run)

Qdrant CloudRun exposes cpu_limit and memory_limit as dedicated top-level variables. Qdrant loads HNSW vector indexes into RAM — size memory_limit according to your collection size and vector dimensionality.

Single-instance constraint: max_instance_count = 1 is strongly recommended. Qdrant is a single-writer store — multiple instances against the same GCS FUSE mount will corrupt collection data. Scale vertically (increase CPU and memory) rather than horizontally.

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment
cpu_limit4'1000m'CPU per instance
memory_limit4'1Gi'Memory per instance. Increase for large collections; HNSW indexes are memory-resident.
min_instance_count41Keep at 1+ to avoid cold starts during HNSW index loading
max_instance_count41Keep at 1 — Qdrant is single-writer
container_port46333Qdrant REST API port
execution_environment4'gen2'Gen2 required for GCS FUSE
timeout_seconds4300Max request duration (0–3600 s). Increase for large batch upserts or snapshot operations.
enable_cloudsql_volume4falseNot applicable — Qdrant has no SQL database
enable_image_mirroring4trueMirror the Qdrant image into Artifact Registry
container_protocol4'http1'Use 'h2c' to enable HTTP/2 for gRPC clients
traffic_split4[]Canary/blue-green traffic allocation
max_revisions_to_retain47Maximum Cloud Run revisions to keep
service_annotations4{}Cloud Run service annotations
service_labels4{}Cloud Run service labels

Differences from App CloudRun defaults:

VariableApp CloudRunQdrant CloudRunReason
container_port80806333Qdrant's native REST API port
ingress_settings'all''internal'Vector databases should not be publicly exposed by default
enable_redistruefalse (hard-coded)Qdrant has no Redis dependency
database_typeconfigurableNONE (fixed)Qdrant manages its own embedded storage
min_instance_count01Avoid cold start delays during HNSW index loading

B. Storage (GCS FUSE)

Qdrant requires persistent storage for its WAL, collection data, HNSW index files, and metadata. Qdrant Common automatically provisions a GCS bucket and mounts it at /qdrant/storage via GCS FUSE. QDRANT__STORAGE__STORAGE_PATH is set to match.

VariableGroupDefaultDescription
create_cloud_storage11trueSet false to skip GCS bucket creation
storage_buckets11[]Additional GCS buckets beyond the storage bucket
gcs_volumes11[]Additional GCS FUSE volume mounts
enable_nfs11falseMount Cloud Filestore NFS (requires gen2). Qdrant uses GCS for storage.
nfs_mount_path11'/mnt/nfs'NFS container mount path
manage_storage_kms_iam11falseCreate CMEK KMS key for storage
enable_artifact_registry_cmek11falseEnable CMEK for Artifact Registry

The auto-provisioned bucket uses: storage_class = "STANDARD", versioning_enabled = false, public_access_prevention = "enforced".

C. Networking

VariableGroupDefaultDescription
ingress_settings5'internal'Recommended. 'all' requires enable_api_key = true (plan-time validation).
vpc_egress_setting5'PRIVATE_RANGES_ONLY''PRIVATE_RANGES_ONLY' or 'ALL_TRAFFIC'

4. Authentication & Access Control

A. Qdrant API Key

The primary Qdrant-specific security control. When enable_api_key = true:

  • A 32-character alphanumeric API key is generated and stored in Secret Manager
  • Qdrant starts with QDRANT__SERVICE__API_KEY set to the key value
  • All REST calls must include api-key: <key> in the request header
  • All gRPC calls must include api-key: <key> in the metadata
  • Python client usage: qdrant_client.QdrantClient(host=..., api_key="<key>")

Plan-time guard: validation.tf includes a precondition that prevents deploying with ingress_settings = "all" and enable_api_key = false simultaneously. This blocks accidental public exposure of an unauthenticated Qdrant instance.

VariableGroupDefaultDescription
enable_api_key3falseGenerate and store API key in Secret Manager. Recommended for all non-internal deployments.

B. Identity-Aware Proxy (IAP)

VariableGroupDefaultDescription
enable_iap5falseEnable IAP on the Cloud Run service
iap_authorized_users5[]Users/SAs granted access
iap_authorized_groups5[]Google Groups granted access

C. Cloud Armor

VariableGroupDefaultDescription
enable_cloud_armor10falseProvision Global HTTPS LB + Cloud Armor WAF
admin_ip_ranges10[]CIDR ranges exempted from WAF rules
application_domains10[]Custom domains for the HTTPS LB
enable_cdn10falseEnable Cloud CDN on the HTTPS LB

5. Observability & Health

A. Health Probes

Qdrant exposes two dedicated health endpoints with distinct purposes:

ProbeEndpointRationale
startup_probe/readyzQdrant reports ready once all collections are fully loaded. Prevents traffic before index loading completes.
liveness_probe/livezDedicated liveness endpoint unaffected by collection load state. Critical: using /readyz for liveness causes spurious restart loops during large collection loads.
VariableGroupDefaultDescription
startup_probe14{ path="/readyz", initial_delay=15, period=10, threshold=10 }Startup probe
liveness_probe14{ path="/livez", initial_delay=30, period=30, threshold=3 }Liveness probe
uptime_check_config14{ enabled=true, path="/readyz" }Cloud Monitoring uptime check
alert_policies14[]Cloud Monitoring metric alert policies

Differences from App CloudRun probe defaults:

FieldApp CloudRunQdrant CloudRunReason
Startup path/healthz/readyzQdrant's dedicated readiness endpoint
Liveness path/healthz/livezQdrant's dedicated liveness endpoint (separate from readiness)

B. Backup & Recovery

VariableGroupDefaultDescription
backup_schedule7'0 2 * * *'Cron expression (UTC) for automated backups
backup_retention_days77Days to retain backup files
enable_backup_import7falseTrigger a one-time restore on apply
backup_source7'gcs''gcs' (full GCS URI) or 'gdrive' (file ID)
backup_uri7""GCS URI or Drive file ID. Mapped to backup_file in App CloudRun.
backup_format7'tar'Backup format: sql, tar, gz, tgz, tar.gz, zip

6. CI/CD & Delivery

VariableGroupDefaultDescription
enable_cicd_trigger8falseProvision a Cloud Build GitHub trigger
github_repository_url8""Full HTTPS URL of the GitHub repository
github_token8""GitHub PAT. Sensitive. Required on first apply.
github_app_installation_id8""GitHub App installation ID
cicd_trigger_config8{ branch_pattern = "^main$" }Advanced trigger config
enable_cloud_deploy8falseProvision a Cloud Deploy pipeline
cloud_deploy_stages8[dev, staging, prod(approval)]Ordered promotion stages
enable_binary_authorization8falseEnforce image attestation

7. Platform-Managed Behaviours

BehaviourImplementationDetail
No databasedatabase_type = "NONE" fixed by Qdrant CommonNo Cloud SQL instance is created
No Redisenable_redis = false hard-coded in main.tfQdrant has no caching dependency
Storage path env varQDRANT__STORAGE__STORAGE_PATH=/qdrant/storage always injectedAligned with GCS FUSE mount point
HTTP port env varQDRANT__SERVICE__HTTP_PORT=6333 always injectedExplicit port matching container_port
gRPC disabledQDRANT__SERVICE__GRPC_PORT not setCloud Run does not expose port 6334. Use container_protocol = "h2c" for gRPC over the main port.
Public ingress blockedPlan-time validation in validation.tfingress_settings = "all" blocked unless enable_api_key = true
GCS storage bucket<prefix>-storage provisioned by Qdrant CommonMounted at /qdrant/storage via GCS FUSE
Separate liveness/readinessStartup: /readyz, Liveness: /livezPrevents restart loops during large collection loads

8. Variable Reference

All user-configurable variables, sorted by UI group then order.

VariableGroupDefaultDescription
module_description0Qdrant platform textPlatform metadata
module_documentation0docs URLDocumentation URL
module_dependency0['Services GCP']Required modules
module_services0GCP service listGCP services consumed
credit_cost050Deployment credit cost
require_credit_purchases0falseCredit balance check
enable_purge0truePermit full resource deletion
public_access0falsePlatform catalogue visibility
shared_users0[]Users with access regardless of public_access
deployment_id0""Deployment ID suffix
resource_creator_identity0platform SATerraform service account
project_id1GCP project ID. Required.
region1'us-central1'Region fallback
tenant_deployment_id2'demo'Resource name suffix
support_users2[]Monitoring alert recipients
resource_labels2{}Resource labels
application_name3'qdrant'Base resource name
application_display_name3'Qdrant Vector Database'Display name
description3Qdrant descriptionService description
application_version3'latest'Qdrant image tag
enable_api_key3falseGenerate API key in Secret Manager
deploy_application4trueDeploy the Cloud Run service
cpu_limit4'1000m'CPU per instance
memory_limit4'1Gi'Memory per instance
min_instance_count41Min instances
max_instance_count41Max instances (keep at 1)
container_port46333REST API port
execution_environment4'gen2'Gen2 required for GCS FUSE
timeout_seconds4300Request timeout
enable_image_mirroring4trueMirror to Artifact Registry
container_protocol4'http1''http1' or 'h2c' for gRPC
traffic_split4[]Traffic allocation
max_revisions_to_retain47Max revisions
ingress_settings5'internal''internal', 'all' (requires API key), or 'internal-and-cloud-load-balancing'
vpc_egress_setting5'PRIVATE_RANGES_ONLY'VPC egress mode
enable_iap5falseEnable IAP
iap_authorized_users5[]IAP users
iap_authorized_groups5[]IAP groups
environment_variables6{}Plain-text env vars
secret_environment_variables6{}Secret Manager references
secret_propagation_delay630Post-creation wait
secret_rotation_period6'2592000s'Rotation period
backup_schedule7'0 2 * * *'Backup cron
backup_retention_days77Backup retention
enable_backup_import7falseOne-time restore
backup_source7'gcs'Backup source
backup_uri7""Backup location
backup_format7'tar'Backup format
enable_cicd_trigger8falseCloud Build trigger
github_repository_url8""GitHub URL
github_token8""GitHub PAT. Sensitive.
cicd_trigger_config8{ branch_pattern = "^main$" }Trigger config
enable_cloud_deploy8falseCloud Deploy pipeline
cloud_deploy_stages8[dev, staging, prod(approval)]Stages
enable_binary_authorization8falseBinary Authorization
enable_cloud_armor10falseCloud Armor WAF
admin_ip_ranges10[]WAF exemptions
application_domains10[]Custom domains
enable_cdn10falseCloud CDN
max_images_to_retain107Artifact Registry image retention count
delete_untagged_images10trueDelete untagged images
image_retention_days1030Image retention days
create_cloud_storage11trueGCS buckets
storage_buckets11[]Additional buckets
enable_nfs11falseNFS mount
gcs_volumes11[]Additional GCS FUSE volumes
manage_storage_kms_iam11falseCMEK for storage
enable_artifact_registry_cmek11falseCMEK for Artifact Registry
initialization_jobs13[]One-shot Cloud Run Jobs
cron_jobs13[]Recurring scheduled jobs
startup_probe14{ path="/readyz" }Startup probe
liveness_probe14{ path="/livez" }Liveness probe
uptime_check_config14{ path="/readyz" }Uptime check
alert_policies14[]Alert policies
enable_vpc_sc23falseVPC Service Controls
vpc_cidr_ranges23[]VPC-SC CIDR ranges
vpc_sc_dry_run23trueLog without blocking
organization_id23""GCP Org ID for VPC-SC
enable_audit_logging23falseCloud Audit Logs

9. Outputs

OutputDescription
service_urlCloud Run service HTTPS URL
service_nameCloud Run service name
service_locationGCP region
project_idGCP project ID
deployment_idDeployment ID suffix
storage_bucketsProvisioned GCS bucket list
container_imageContainer image used

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
enable_api_key (Common)falseCriticalWithout an API key, any caller who can reach the Qdrant endpoint can read, modify, or delete all collections and their vectors. Enable for any deployment reachable outside the VPC. The generated key is stored in Secret Manager and must be passed as api-key: <key> in all gRPC/HTTP requests.
ingress_settings"internal"CriticalDefault is internal (VPC-only). Changing to "all" exposes the Qdrant REST and gRPC ports to the public internet. Never set to "all" without enable_api_key = true.
enable_nfsfalseHighWithout NFS, Qdrant stores its collection storage at /qdrant/storage inside the ephemeral container filesystem. Any Cloud Run revision deployment or instance restart erases all collections and their vectors permanently. Enable NFS (requires execution_environment = "gen2") for persistence.
execution_environment"gen2"HighNFS mounts require Gen2. If enable_nfs = true is set with execution_environment = "gen1", the Cloud Run deployment fails at plan time.
memory_limit"1Gi"HighQdrant loads vector indexes (HNSW graphs) entirely into memory. Each 1M vectors at 1536 dimensions requires approximately 6 Gi. The default 1Gi supports only very small collections. OOM kills terminate all in-flight queries and cause a cold restart from storage.
cpu_limit"1000m"MediumHNSW index builds are CPU-intensive; concurrent similarity searches compete for CPU. Under 1000m, p99 query latency degrades noticeably. Scale to 2000m4000m for production.
collection vector dimensions(set at collection creation time via client)CriticalThe vector dimension parameter in a Qdrant collection is immutable after creation. If the dimension does not match the embedding model used to generate vectors (e.g., 768 vs. 1536), all upsert operations fail with a dimension mismatch error and the collection is permanently unusable. Always verify the embedding model dimension before creating a collection.
min_instance_count1MediumScale-to-zero (0) causes Qdrant to restart and reload indexes from NFS/GCS on the next request. For collections with millions of vectors, this reload can take tens of seconds, causing request timeouts. Keep at 1 for latency-sensitive workloads.
max_instance_count1HighMultiple Qdrant Cloud Run instances cannot share a single NFS collection storage safely. Qdrant does not support distributed operation in this topology. Keep at 1 or use Qdrant GKE with StatefulSet for production scale.
container_port6333CriticalQdrant listens on HTTP port 6333 and gRPC port 6334. Changing the HTTP port requires a matching QDRANT__SERVICE__HTTP_PORT environment variable; mismatches cause health check failures and no-traffic revisions.
vpc_egress_setting"PRIVATE_RANGES_ONLY"LowCorrect for VPC-only deployments. Change to ALL_TRAFFIC only if Qdrant must fetch snapshots or reach public endpoints directly.
timeout_seconds300MediumLarge ANN searches or snapshot upload/download operations can take longer than the default. Increase to 600 for collections with tens of millions of vectors.
application_version"latest"MediumUsing "latest" is non-reproducible. Qdrant's storage format can change between major versions. Upgrading across incompatible storage formats requires exporting and re-importing all collections. Pin to a specific version tag in production.
enable_gcs_storage_volume (Common)trueHighGCS Fuse is the fallback persistence mechanism when NFS is disabled. Disabling it with enable_nfs = false means all data is lost on instance restart. Do not disable unless NFS is used.
enable_iapfalseHighWithout IAP, the endpoint (when public) is accessible to any caller. Enable IAP for user-facing deployments and ensure enable_api_key = true as well for defense in depth.
liveness_probe (Common)/livez endpointHighQdrant exposes /livez (always 200) and /readyz (503 while loading large collections). Using /readyz as the liveness target causes spurious pod restarts whenever Qdrant loads a large collection. Always use /livez for liveness and /readyz for readiness.
enable_image_mirroringtrueLowDisabling mirroring skips copying the Qdrant image into Artifact Registry. Deployments then pull directly from Docker Hub and are subject to pull rate limits, causing intermittent deployment failures in CI/CD.
secret_propagation_delay30MediumIn large projects, Secret Manager replication may exceed 30 seconds. Increase to 60 to prevent the deployment from reading an empty API key secret.

10. Destroying Resources

When enable_purge = true, tofu destroy removes all module-managed resources. After Cloud Run service deletion, GCP may hold serverless IPv4 addresses for 20–30 minutes. Re-run tofu destroy after that window if the first attempt fails.