Skip to main content

Qdrant on Google Kubernetes Engine

This document provides a comprehensive reference for the modules/Qdrant_GKE Terraform module. It covers architecture, IAM, configuration variables, Qdrant-specific behaviours, and operational patterns for deploying Qdrant on GKE Autopilot.


1. Module Overview

Qdrant GKE is a wrapper module built on top of App GKE. It deploys Qdrant — the high-performance vector database and similarity search engine — on GKE Autopilot with production-grade StatefulSet persistence, GCS FUSE storage, optional API key authentication, Workload Identity, and horizontal auto-scaling.

Key Capabilities:

  • Compute: GKE Autopilot, 1 vCPU / 1 Gi by default. StatefulSet or Deployment workload type.
  • Data Persistence: StatefulSet PVC (recommended for production) or GCS FUSE-mounted Cloud Storage bucket at /qdrant/storage. No Cloud SQL, no Redis.
  • Security: Optional API key via Secret Manager injected as QDRANT__SERVICE__API_KEY. Inherits Cloud Armor, IAP, and VPC-SC from App GKE.
  • CI/CD: Cloud Build image pipeline by default; Cloud Deploy progressive delivery optional.
  • Reliability: Startup probe targets /readyz; liveness probe targets /livez. PodDisruptionBudget enabled by default.
  • gRPC: Disabled by default. Enable via environment_variables = { QDRANT__SERVICE__GRPC_PORT = "6334" } and configure a second Service port manually.

Project & Application Identity

VariableGroupTypeDefaultDescription
project_id1stringGCP project ID. Required.
region1string'us-central1'GCP region fallback
tenant_deployment_id2string'demo'Short suffix appended to resource names
support_users2list(string)[]Email recipients for monitoring alerts
resource_labels2map(string){}Labels applied to all resources
application_name3string'qdrant'Base resource name. Do not change after initial deployment.
application_display_name3string'Qdrant Vector Database'Human-readable name
description3stringQdrant descriptionWorkload description
application_version3string'latest'Qdrant image tag

2. IAM & Access Control

Qdrant GKE delegates all IAM provisioning to App GKE. Workload Identity is used — the Kubernetes service account is bound to a GCP service account with the minimum required roles (GCS read/write for the storage bucket, Secret Manager accessor for the API key).

API key: When enable_api_key = true, Qdrant Common generates a 32-character API key and stores it in Secret Manager as <prefix>-api-key. It is injected as QDRANT__SERVICE__API_KEY. All REST and gRPC calls must include api-key: <key> in the request header/metadata.


3. Core Service Configuration

A. Compute (GKE)

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment
cpu_limit4'1000m'CPU limit per pod
memory_limit4'1Gi'Memory limit per pod. Qdrant loads HNSW indexes into memory — size accordingly.
container_resources4{ cpu_limit="1000m", memory_limit="1Gi" }Structured resource spec. Provides cpu_request, mem_request, ephemeral_storage_* fields.
min_instance_count41Minimum pod replicas
max_instance_count41Maximum pod replicas. Keep at 1 for single-writer safety.
timeout_seconds4300Request timeout (0–3600 s)
enable_image_mirroring4trueMirror Qdrant image to Artifact Registry
enable_vertical_pod_autoscaling4falseEnable VPA

B. Kubernetes Workload

VariableGroupDefaultDescription
gke_cluster_name6""Target GKE cluster. Auto-discovered when empty.
gke_cluster_selection_mode6'primary''explicit', 'round-robin', or 'primary'
namespace_name6""Kubernetes namespace. Auto-generated when empty.
workload_type6null'Deployment' or 'StatefulSet'. Auto-resolves to StatefulSet when stateful_pvc_enabled = true.
service_type6'ClusterIP''ClusterIP' (recommended), 'LoadBalancer', or 'NodePort'
session_affinity6'None''None' or 'ClientIP'
termination_grace_period_seconds660Grace period for Qdrant to flush WAL writes (0–3600 s)
enable_network_segmentation6falseEnable Kubernetes NetworkPolicies
configure_service_mesh6falseEnable Istio service mesh injection
deployment_timeout61800Seconds Terraform waits for rollout completion

C. StatefulSet Persistence

For production deployments, stateful_pvc_enabled = true is strongly recommended over GCS FUSE. Qdrant's WAL and HNSW index files are I/O-intensive — PVC-backed storage provides significantly lower latency than GCS FUSE for these access patterns.

VariableGroupDefaultDescription
stateful_pvc_enabled7nullEnable PVC. Recommended for production. Auto-selects StatefulSet.
stateful_pvc_size7'20Gi'Per-pod PVC size. Size to hold all collection data, HNSW indexes, and WAL.
stateful_pvc_mount_path7'/qdrant/storage'Container path for the PVC. Matches QDRANT__STORAGE__STORAGE_PATH.
stateful_pvc_storage_class7'standard-rwo''standard-rwo' (Balanced PD) or 'premium-rwo' (higher IOPS, lower latency)
stateful_headless_service7nullCreate a headless service for stable network identities
stateful_pod_management_policy7null'OrderedReady' ensures safe sequential restarts
stateful_update_strategy7null'RollingUpdate' for zero-downtime updates
stateful_fs_group71000GID for PVC write access. Set to 0 to leave unset.

PVC prevents GCS double-mount: When stateful_pvc_enabled = true, the wrapper passes enable_gcs_storage_volume = false to Qdrant Common, preventing the <prefix>-storage bucket from being mounted at /qdrant/storage alongside the PVC.

D. Storage (GCS FUSE)

When stateful_pvc_enabled is not set, Qdrant data is persisted via GCS FUSE at /qdrant/storage:

VariableGroupDefaultDescription
create_cloud_storage14trueProvision GCS buckets
storage_buckets14[]Additional GCS buckets
gcs_volumes14[]Additional GCS FUSE volumes
manage_storage_kms_iam14falseCMEK for storage
enable_artifact_registry_cmek14falseCMEK for Artifact Registry

4. Authentication & Access Control

A. Qdrant API Key

VariableGroupDefaultDescription
enable_api_key3falseGenerate API key in Secret Manager. Injected as QDRANT__SERVICE__API_KEY. Recommended for all deployments with external access.

B. Identity-Aware Proxy (IAP)

IAP via the Kubernetes Gateway API. Requires enable_custom_domain or enable_cdn.

VariableGroupDefaultDescription
enable_iap20falseEnable IAP via Kubernetes Gateway
iap_authorized_users20[]IAP-authorized users
iap_authorized_groups20[]IAP-authorized groups
iap_oauth_client_id20""OAuth client ID. Sensitive.
iap_oauth_client_secret20""OAuth client secret. Sensitive.

C. Cloud Armor

VariableGroupDefaultDescription
enable_cloud_armor21falseAttach Cloud Armor to GKE Ingress
admin_ip_ranges21[]CIDR ranges exempted from WAF rules
cloud_armor_policy_name21'default-waf-policy'Cloud Armor policy name
enable_cdn21falseEnable Cloud CDN via GCPBackendPolicy

5. Observability & Health

A. Health Probes

VariableGroupDefaultDescription
startup_probe_config10{ path="/readyz", ... }Startup probe via App GKE config interface
health_check_config10{ path="/livez", ... }Liveness probe via App GKE config interface
startup_probe10{ path="/readyz", initial_delay=15, period=10, threshold=10 }Startup probe (legacy format)
liveness_probe10{ path="/livez", initial_delay=30, period=30, threshold=3 }Liveness probe (legacy format)
uptime_check_config10{ enabled=true, path="/readyz" }Cloud Monitoring uptime check
alert_policies10[]Metric alert policies

Critical probe guidance: Always use /readyz for the startup probe and /livez for the liveness probe. Qdrant temporarily marks itself as not-ready while loading collections from disk. Using /readyz as the liveness target causes Kubernetes to kill and restart the pod during every collection load — creating a crash loop on instances with large collections.

B. Pod Disruption Budget & Resource Quotas

VariableGroupDefaultDescription
enable_pod_disruption_budget9trueCreate a PodDisruptionBudget
pdb_min_available9'1'Minimum pods available during disruptions
enable_resource_quota8falseCreate a Kubernetes ResourceQuota
quota_memory_requests8""Memory requests quota (binary suffix required, e.g., '4Gi')
quota_memory_limits8""Memory limits quota (binary suffix required)

6. Platform-Managed Behaviours

BehaviourImplementationDetail
No SQL databasedatabase_type = "NONE" fixed by Qdrant CommonNo Cloud SQL resources created
No RedisNot usedQdrant has no caching dependency
Storage path env varQDRANT__STORAGE__STORAGE_PATH=/qdrant/storage always injectedAligned with GCS FUSE / PVC mount point
HTTP port env varQDRANT__SERVICE__HTTP_PORT=6333 always injectedExplicit port
gRPC disabled by defaultQDRANT__SERVICE__GRPC_PORT not setPort 6334 not exposed in default Service. Enable manually via environment_variables.
Separate liveness/readinessStartup: /readyz, Liveness: /livezPrevents restart loops during large collection loads
StatefulSet auto-selectstateful_pvc_enabled = trueAutomatically resolves workload_type to "StatefulSet"
PVC prevents GCS double-mountenable_gcs_storage_volume = false passed to Qdrant CommonPrevents simultaneous PVC and GCS FUSE at /qdrant/storage

7. Variable Reference

VariableGroupDefaultDescription
project_id1GCP project ID. Required.
region1'us-central1'Region fallback
tenant_deployment_id2'demo'Resource name suffix
support_users2[]Monitoring alert recipients
resource_labels2{}Resource labels
application_name3'qdrant'Base resource name
application_display_name3'Qdrant Vector Database'Display name
description3Qdrant descriptionWorkload description
application_version3'latest'Image tag
enable_api_key3falseGenerate API key
deploy_application4trueDeploy workload
cpu_limit4'1000m'CPU limit
memory_limit4'1Gi'Memory limit
min_instance_count41Min replicas
max_instance_count41Max replicas
timeout_seconds4300Request timeout
enable_image_mirroring4trueMirror to Artifact Registry
environment_variables5{}Plain-text env vars
secret_environment_variables5{}Secret Manager references
secret_propagation_delay530Post-creation wait
secret_rotation_period5'2592000s'Rotation period
gke_cluster_name6""GKE cluster name
namespace_name6""Kubernetes namespace
workload_type6null'Deployment' or 'StatefulSet'
service_type6'ClusterIP'Kubernetes Service type
termination_grace_period_seconds660Grace period
stateful_pvc_enabled7nullEnable StatefulSet PVC
stateful_pvc_size7'20Gi'PVC size
stateful_pvc_mount_path7'/qdrant/storage'PVC mount path
stateful_pvc_storage_class7'standard-rwo'StorageClass
stateful_pod_management_policy7nullPod management
stateful_update_strategy7nullUpdate strategy
stateful_fs_group71000fsGroup GID
enable_resource_quota8falseResourceQuota
enable_pod_disruption_budget9truePodDisruptionBudget
pdb_min_available9'1'Min available pods
startup_probe10{ path="/readyz" }Startup probe
liveness_probe10{ path="/livez" }Liveness probe
uptime_check_config10{ enabled=true }Uptime check
initialization_jobs11[]Init jobs
cron_jobs11[]Scheduled jobs
enable_cicd_trigger12falseCloud Build trigger
enable_cloud_deploy12falseCloud Deploy pipeline
enable_binary_authorization12falseBinary Authorization
enable_nfs13falseNFS mount
create_cloud_storage14trueGCS buckets
storage_buckets14[]Additional buckets
gcs_volumes14[]GCS FUSE volumes
backup_schedule17'0 2 * * *'Backup cron
backup_retention_days177Backup retention
enable_backup_import17falseOne-time restore
enable_custom_domain19falseCustom domain
application_domains19[]Domain names
reserve_static_ip19falseReserve static IP
enable_iap20falseIAP via Gateway
enable_cloud_armor21falseCloud Armor
enable_vpc_sc22falseVPC Service Controls
organization_id22""GCP Org ID for VPC-SC
enable_audit_logging22falseCloud Audit Logs

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
enable_api_key (Common)falseCriticalWithout an API key, any caller who can reach the Qdrant service can read, modify, or delete all collections. Set to true for any deployment reachable outside the pod namespace. The key is stored in Secret Manager.
stateful_pvc_enablednullHighWithout a PVC, Qdrant stores collection data in the ephemeral pod filesystem. Any pod restart, rolling update, or node eviction erases all collections permanently. Set stateful_pvc_enabled = true for any production deployment.
stateful_pvc_size"20Gi"HighAn undersized PVC fills up as collections grow (HNSW index + payload storage). A full disk causes Qdrant to crash and makes collections unrecoverable without manual PVC expansion. Provision generously — PVC capacity cannot be decreased after creation. For 1M vectors at 1536 dims, allow at least 10 Gi per collection.
workload_typenullHighnull resolves to Deployment. With stateful_pvc_enabled = true, the module auto-selects StatefulSet. Explicitly combining workload_type = "Deployment" with stateful_pvc_enabled = true fails at plan time.
stateful_pvc_storage_class"standard-rwo"MediumBalanced PD is adequate for most query loads. High-throughput real-time index builds benefit from premium-rwo. Storage class cannot be changed after PVC creation without data migration.
stateful_pvc_mount_path"/qdrant/storage"CriticalMust match Qdrant's storage.storage_path configuration. If the PVC is mounted to a different path than Qdrant reads, all data is stored in the ephemeral container layer and lost on restart. Do not change without a matching QDRANT__STORAGE__STORAGE_PATH environment variable override.
collection vector dimensions(immutable — set at collection creation via client)CriticalQdrant collection vector dimensions are immutable after creation. A mismatch with the embedding model dimension causes all upsert operations to fail. There is no migration path — the collection must be deleted and recreated, losing all vectors.
memory_limit"1Gi"HighQdrant loads HNSW graphs into memory. The default 1Gi supports only small test collections. Production workloads need 4Gi16Gi or more. OOM kills drop all in-flight queries and trigger a full index reload from disk.
cpu_limit"1000m"MediumHNSW index builds and high-concurrency searches are CPU-bound. Throttling below 1000m degrades p99 latency and slows index construction. Scale to 2000m4000m for production.
min_instance_count1MediumScale-to-zero on GKE (0) deletes the pod. After restart, Qdrant reloads indexes from PVC, which takes tens of seconds for large collections. For latency-sensitive applications, keep at 1.
max_instance_count1HighMultiple Qdrant replicas sharing a single PVC (RWO) are not supported. RWO PVCs can only be mounted by one pod at a time; scaling beyond 1 prevents additional pods from scheduling. For horizontal scaling, deploy separate Qdrant instances with collection sharding at the application level.
liveness_probe (Common)/livezHighQdrant's /readyz returns 503 while loading large collections. Using /readyz as the liveness target causes spurious restarts during collection load, which in turn triggers more reloads — a restart loop. Always use /livez for liveness checks.
quota_memory_requests""CriticalIf enable_resource_quota = true with a bare integer (e.g. "4" instead of "4Gi"), Kubernetes treats it as bytes, blocking all pod scheduling. Always use binary suffixes. Note: Qdrant GKE accepts but does not forward this variable — check App GKE.
application_version"latest"MediumQdrant's storage format changes between major versions. An unexpected image change via latest during rebuild can make existing collections unreadable. Pin to a specific semver tag for production.
backup_schedule"0 2 * * *"MediumQdrant collections should be snapshotted regularly via the Qdrant API or GCS Fuse volume backup. Ensure the backup job is active and tested before relying on it for production recovery.
enable_iapfalseHighWithout IAP, the GKE Ingress endpoint is accessible to any caller. Enable IAP and enable_api_key = true together for defense-in-depth on production deployments.