Skip to main content

LibreChat on Google Kubernetes Engine (GKE Autopilot)

This document provides a comprehensive reference for the modules/LibreChat_GKE Terraform module. It covers architecture, configuration variables, LibreChat-specific behaviours, and operational patterns for deploying LibreChat on GKE Autopilot.


1. Module Overview

LibreChat GKE deploys LibreChat — the open-source AI chat interface — on GKE Autopilot with Kubernetes-native scaling, Workload Identity IAM, and the full Foundation Module (App GKE) infrastructure stack.

Key differences from LibreChat CloudRun:

  • Runs as a Kubernetes Deployment (or StatefulSet when PVC is enabled) instead of Cloud Run.
  • Uses the GCS Fuse CSI driver for storage mounts instead of Cloud Run volume mounts.
  • Horizontal Pod Autoscaler (HPA) replaces Cloud Run's built-in scaling.
  • Workload Identity is used instead of Cloud Run SA direct bindings.
  • Session affinity is set to ClientIP by default for WebSocket continuity.
  • credit_cost defaults to 150 (vs 50 for Cloud Run) due to the GKE cluster requirements.

GCP Services deployed:

  • GKE Autopilot cluster (via Services GCP)
  • Kubernetes Deployments / StatefulSets
  • Kubernetes Services (LoadBalancer / ClusterIP)
  • Artifact Registry
  • Cloud Storage (GCS Fuse CSI Driver)
  • Filestore / NFS server (optional)
  • Secret Manager + Workload Identity
  • Cloud IAM
  • Cloud Monitoring + Uptime Checks
  • Cloud Armor WAF (optional)
  • Cloud Deploy (optional)
  • Binary Authorization (optional)
  • VPC Service Controls (optional)

MongoDB note: LibreChat uses MongoDB. No Cloud SQL instance is provisioned (database_type = "NONE"). LibreChat Common auto-provisions Firestore with MongoDB compatibility when no mongodb_uri is supplied.


2. Prerequisites

  1. Services GCP deployed in the same GCP project (provides GKE Autopilot cluster, VPC, NFS server).
  2. MongoDB — MongoDB Atlas, self-hosted, or Firestore auto-provisioned by the module.
  3. Redis (recommended) — Cloud Memorystore for Redis or existing Redis instance accessible from the GKE cluster's VPC.

3. Core Service Configuration

A. Compute (GKE)

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment.
container_image_source4'prebuilt''prebuilt' (GHCR) or 'custom' (Cloud Build).
container_image4'ghcr.io/danny-avila/librechat'Container image URI.
container_resources4{ cpu_limit = "2000m", memory_limit = "2Gi" }CPU and memory per pod.
container_port43080LibreChat's native HTTP port.
min_instance_count41Minimum pod replicas.
max_instance_count45Maximum pod replicas (HPA ceiling).
execution_environment4'gen2'Execution environment setting (passed through to Foundation Module).
timeout_seconds4600Request timeout in seconds.

B. MongoDB Database

Same as LibreChat CloudRun — see LibreChat CloudRun §3.B for the full connection modes reference.

VariableGroupDefaultDescription
mongodb_uri3MongoDB connection URI. Sensitive. Required or use Firestore auto-discovery.
firestore_mongodb_host1Firestore endpoint host (manual override).
firestore_mongodb_database12'LibreChat'Firestore database ID / MongoDB database name.
firestore_mongodb_username12""SCRAM username.
firestore_mongodb_password12""SCRAM password. Auto-generated when not set.
database_type12'NONE'Fixed. Must remain 'NONE'.

C. LibreChat Application Settings

VariableGroupDefaultDescription
app_title3'LibreChat'Title shown in the LibreChat UI.
allow_registration3trueAllow self-registration. Set false after creating admin account.
allow_social_login3falseEnable OAuth social login providers.

D. Storage (GCS & NFS)

VariableGroupDefaultDescription
create_cloud_storage11trueSet false to skip additional bucket creation.
storage_buckets11[{ name_suffix = "data" }]Additional GCS buckets beyond the auto-provisioned uploads bucket.
enable_nfs11falseProvisions a Filestore NFS instance and mounts it into the pod.
nfs_mount_path11'/mnt/nfs'Pod mount path for the NFS volume.
gcs_volumes11[]GCS buckets to mount via GCS Fuse CSI driver.
manage_storage_kms_iam11falseCreates CMEK KMS key and enables CMEK on all storage buckets.
enable_artifact_registry_cmek11falseCreates Artifact Registry KMS key for at-rest image encryption.

E. Networking

VariableGroupDefaultDescription
ingress_settings5'all''all' — public internet; 'internal' — cluster VPC only.
vpc_egress_setting5'PRIVATE_RANGES_ONLY'VPC egress routing.
region1'us-central1'GCP region. Auto-discovered from cluster info.

4. GKE-Specific Features

A. StatefulSet and Persistent Volumes

Unlike LibreChat CloudRun, the GKE variant supports persistent PVCs:

VariableGroupDefaultDescription
stateful_pvc_enabledfalseEnables a PersistentVolumeClaim for the pod. Automatically uses StatefulSet.
workload_type'Deployment'Set to 'StatefulSet' for stateful workloads. Auto-selected when stateful_pvc_enabled = true.
quota_memory_requestsResourceQuota memory requests. Must use binary suffixes ('4Gi', '8192Mi').
quota_memory_limitsResourceQuota memory limits. Must use binary suffixes.

Important: quota_memory_requests and quota_memory_limits must use binary unit suffixes (Gi, Mi). Bare integers are treated as bytes by Kubernetes and will block all pod scheduling.

B. Horizontal Pod Autoscaler

HPA is configured via min_instance_count and max_instance_count. The Foundation Module (App GKE) manages the HPA resource. LibreChat scales well horizontally when Redis is enabled for session management.


5. Advanced Security

Identical to LibreChat CloudRun for Cloud Armor, IAP, Binary Authorization, and VPC Service Controls. See LibreChat CloudRun §4.

Workload Identity: The GKE variant uses Workload Identity instead of direct service account bindings. The Kubernetes service account is annotated with the GCP SA email, and the GCP SA is granted iam.workloadIdentityUser on the Kubernetes SA.


6. Redis Integration

Same as LibreChat CloudRun. Redis is strongly recommended for GKE deployments because pod restarts and rescheduling are more frequent than Cloud Run revisions, making session persistence more critical.

VariableGroupDefaultDescription
enable_redis21falseEnables Redis for session management. Strongly recommended for GKE.
redis_host21""Redis hostname or IP (Cloud Memorystore recommended).
redis_port216379Redis TCP port.
redis_auth21""Redis AUTH password. Sensitive.

7. CI/CD & Delivery

Same variables as LibreChat CloudRun. See LibreChat CloudRun §6.


8. Observability

VariableGroupDefaultDescription
startup_probe14{ enabled=true, path="/", initial_delay_seconds=30, failure_threshold=10 }Pod startup probe.
liveness_probe14{ enabled=true, path="/", initial_delay_seconds=60, failure_threshold=3 }Pod liveness probe.
uptime_check_config14{ enabled=true, path="/" }Cloud Monitoring uptime check.
alert_policies14[]Cloud Monitoring metric alert policies.

9. Platform-Managed Behaviours

BehaviourDetail
MongoDB onlydatabase_type = "NONE" — no Cloud SQL is provisioned.
Firestore auto-provisioningENTERPRISE Firestore DB created when no mongodb_uri or firestore_mongodb_host is set. Never deleted on destroy.
SCRAM user init jobAuto-injected initialization job creates/updates MongoDB SCRAM user in Firestore.
JWT/credential secretsCREDS_KEY, CREDS_IV, JWT_SECRET, JWT_REFRESH_SECRET auto-generated by LibreChat Common.
Session affinityClientIP session affinity set by default in the Kubernetes Service for WebSocket continuity.
GCS Fuse CSIFile uploads use GCS Fuse CSI driver mounted at /uploads.
Workload IdentityGKE SA annotated and bound via Workload Identity instead of direct SA binding.

10. Outputs

OutputDescription
kubernetes_readyTrue when the GKE cluster is available and all Kubernetes resources are deployed.
deployment_idDeployment ID suffix used in resource names.

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
CREDS_KEY (auto-generated)Random 32-byte hex key in Secret ManagerCriticalEncrypts all saved AI provider credentials for every user. Changing it after first use destroys all stored credentials — every user must re-enter their API keys. Treat as immutable after the first user saves credentials.
CREDS_IV (auto-generated)Random 16-byte hex IV in Secret ManagerCriticalAES-GCM IV paired with CREDS_KEY. Same consequences as rotating CREDS_KEY — all stored credentials become undecryptable.
JWT_SECRET (auto-generated)Random secret in Secret ManagerHighSigns all access and refresh tokens. Rotation logs out all users immediately. Plan rotation during a maintenance window.
mongodb_uriAuto-discovered Firestore MongoDB endpointCriticalLibreChat requires MongoDB or Firestore MongoDB compatibility. If auto-discovery fails and no manual URI is provided, the pod crashes on startup and serves no traffic.
firestore_mongodb_hostAuto-discoveredHighManual host override. A stale or incorrect value breaks all data operations and renders the service non-functional.
enable_cloudsql_volumefalseCriticalMust remain false. LibreChat does not use Cloud SQL. Enabling injects an unnecessary Cloud SQL Auth Proxy sidecar and can conflict with MongoDB-only connection routing.
enable_custom_sql_scriptsfalseCriticalMust remain false. LibreChat does not use Cloud SQL. Enabling this causes the init job to attempt SQL script execution against a non-existent Cloud SQL instance.
allow_registrationtrueHighCombined with a LoadBalancer-exposed service, open registration allows anyone on the network to create an account. Disable after the admin account is created or restrict with IAP.
USE_REDIS / enable_redisfalseHighWithout Redis, multiple pod replicas each have isolated in-memory session state. Users experience session drops when requests land on different pods. Set enable_redis = true and provide redis_host for all multi-replica deployments.
redis_host""HighRequired when enable_redis = true. If not set and Redis is enabled, LibreChat fails to connect to Redis on startup and session caching is broken.
MEILI_MASTER_KEY (auto-generated)Random secret in Secret ManagerHighIf changed after the search index is built, all indices are invalidated. A full re-index of all messages is required after any key rotation.
stateful_pvc_enabledfalseHighWithout a PVC or NFS for file uploads, attachments shared in chat are stored on the container's ephemeral filesystem and are lost when the pod is evicted. Enable PVC or NFS for production.
quota_memory_requests / quota_memory_limitsBinary unit defaultsCriticalMust use binary suffixes (Gi, Mi). Bare integers are treated as bytes by Kubernetes, blocking all pod scheduling in the namespace.
secret_environment_variables (AI provider keys){}CriticalProvider API keys must reference Secret Manager secrets. Injecting them as plain environment_variables exposes them in pod specs visible in kubectl describe pod.
min_instance_count1HighScale-to-zero drops all in-flight SSE streaming connections. Keep at least 1 replica for a reliable chat experience.
timeout_seconds600HighLong AI responses via SSE streaming can exceed several minutes. Insufficient timeout truncates responses mid-stream.
enable_nfsfalseMediumNFS is needed for shared file storage across multiple pod replicas. Without it, uploaded files are pod-local and invisible to other replicas.
workload_typenull (auto-select)MediumSetting stateful_pvc_enabled = true auto-selects StatefulSet. Manually setting workload_type = "Deployment" alongside stateful_pvc_enabled = true fails at plan time.
backup_schedule"" (disabled)HighWithout NFS backup schedules, conversation and user data backed only by Firestore/MongoDB have no GCS-level snapshots. Enable for production.
iap_oauth_client_id / iap_oauth_client_secret""CriticalRequired when enable_iap = true. If not provided, the IAP gateway fails to initialise and the service becomes unreachable.
application_version"latest"MediumUnplanned LibreChat upgrades can change MongoDB schema. Pin to a specific release in production.

Destroying Resources

GKE Autopilot node pools and Kubernetes resources may take 5–10 minutes to fully terminate. The GKE cluster itself is managed by Services GCP and must be destroyed separately.