Skip to main content

Metabase on GKE Autopilot

This document provides a comprehensive reference for the modules/Metabase_GKE Terraform module. It covers architecture, IAM, configuration variables, Metabase-specific behaviours, and operational patterns for deploying Metabase on Google Kubernetes Engine (GKE) Autopilot.


1. Module Overview

Metabase GKE is a wrapper module built on top of App GKE. It uses App GKE for all GCP and Kubernetes infrastructure provisioning and injects Metabase-specific application configuration via Metabase Common.

Key Capabilities:

  • Compute: GKE Autopilot Deployment (Metabase is stateless), 2 vCPU / 4 Gi by default with Horizontal Pod Autoscaling.
  • Data Persistence: Cloud SQL PostgreSQL 15 as the Metabase application database. A db-init Kubernetes Job runs automatically on first deployment.
  • Security: Inherits Cloud Armor WAF, IAP (OAuth 2.0), Binary Authorization, and VPC Service Controls from App GKE.
  • Session Affinity: Defaults to 'ClientIP' (sticky sessions) — recommended for Metabase to avoid session interruptions when HPA scales pods.
  • Reliability: Health probes target /api/health with 120-second initial delay for JVM startup. PodDisruptionBudget is enabled by default.

JVM startup note: Metabase requires at least 60–120 seconds to start. Set min_instance_count = 1 to keep at least one pod warm and avoid scheduling delays on GKE Autopilot.


2. IAM & Access Control

Metabase GKE delegates all IAM provisioning to App GKE. Metabase pods access Cloud SQL via the Cloud SQL Auth Proxy sidecar and Workload Identity.

Default db-init job: Metabase Common provides a db-init Kubernetes Job using postgres:15-alpine that runs before the Metabase workload starts, creating the PostgreSQL database and user.


3. Core Service Configuration

A. Compute (GKE Autopilot)

VariableGroupDefaultDescription
deploy_application4trueSet false for infrastructure-only deployment.
container_image_source4'custom''custom' builds via Cloud Build; 'prebuilt' deploys an existing image.
container_image4""Override image URI. Leave empty for Cloud Build to manage.
container_resources4{ cpu_limit="2000m", memory_limit="4Gi" }CPU/memory limits. JVM requires at least 2 Gi.
min_instance_count41Minimum pod replicas. Set to at least 1 to avoid JVM cold starts.
max_instance_count45Maximum pod replicas.
container_port43000Metabase's Jetty HTTP port.
container_protocol4'http1'HTTP protocol version.
timeout_seconds4300Max load balancer backend timeout. Increase for complex queries.
enable_cloudsql_volume4trueInjects the Cloud SQL Auth Proxy sidecar.
enable_image_mirroring4trueMirrors the Metabase image into Artifact Registry.
enable_vertical_pod_autoscaling4falseEnables VPA for JVM right-sizing. Recommended for GKE Autopilot.
service_annotations4{}Custom annotations on the Kubernetes Service resource.
service_labels4{}Labels applied to the Kubernetes Service resource.

B. GKE-Specific Backend Configuration

VariableGroupDefaultDescription
workload_type6null'Deployment' or 'StatefulSet'. Metabase is stateless — use 'Deployment'.
service_type6'LoadBalancer'Kubernetes Service type.
session_affinity6'ClientIP'Sticky sessions recommended for Metabase to prevent session loss on pod scaling.
gke_cluster_name6""Target GKE cluster. Leave empty to auto-discover.
gke_cluster_selection_mode6'primary'Strategy for choosing the target cluster.
namespace_name6""Kubernetes namespace. Leave empty to auto-generate.
termination_grace_period_seconds660Seconds Kubernetes waits after SIGTERM. Increase to allow in-flight queries to complete.
enable_network_segmentation6falseCreates Kubernetes NetworkPolicy resources.
enable_multi_cluster_service6falseCreates a ServiceExport for Multi-Cluster Services.
configure_service_mesh6falseEnables Istio service mesh injection.
deployment_timeout61800Maximum seconds Terraform waits for the Deployment rollout. Increase for large JVM images.
prereq_gke_subnet_cidr6'10.201.0.0/24'CIDR range for the GKE subnet. Not referenced in this module.

C. StatefulSet Configuration

Metabase is stateless and does not require StatefulSet. These variables are available for advanced use cases only.

VariableGroupDefaultDescription
stateful_pvc_enabled7nullEnables PVC templates. Not recommended for Metabase.
stateful_pvc_size7'10Gi'PVC storage size.
stateful_pvc_mount_path7'/data'PVC mount path inside the container.
stateful_pvc_storage_class7'standard-rwo'Kubernetes StorageClass.
stateful_headless_service7nullCreates a headless Service for the StatefulSet.
stateful_pod_management_policy7nullPod creation order.
stateful_update_strategy7nullUpdate strategy.
stateful_fs_group70Pod-level fsGroup. Set to 0 to leave unset — Metabase does not require a specific GID.

D. Database (Cloud SQL — PostgreSQL 15)

VariableGroupDefaultDescription
database_type16'POSTGRES_15'Cloud SQL engine. PostgreSQL required for Metabase.
application_database_name16'metabase'PostgreSQL database name.
application_database_user16'metabase'PostgreSQL application user.
database_password_length1632Auto-generated password length. Range: 16–64.
enable_postgres_extensions16falseNot required for Metabase.
postgres_extensions16[]Not applicable for Metabase.
enable_auto_password_rotation16falseAutomated zero-downtime password rotation.
rotation_propagation_delay_sec1690Seconds to wait after rotation before restarting pods.
db_name16'metabase'Passed to Metabase Common.
db_user16'metabase'Passed to Metabase Common.

E. Storage

Metabase does not require dedicated GCS storage. All application state is stored in PostgreSQL.

VariableGroupDefaultDescription
create_cloud_storage14trueSet false to skip GCS bucket creation.
storage_buckets14[]GCS bucket configurations. Empty by default — Metabase does not require object storage.
gcs_volumes14[]GCS Fuse volume mounts.
manage_storage_kms_iam14falseCreates CMEK KMS keyring.
enable_artifact_registry_cmek14falseEnables CMEK for Artifact Registry images.
max_images_to_retain147Maximum recent container images to keep.
delete_untagged_images14trueAutomatically deletes untagged images.
image_retention_days1430Days after which images are eligible for deletion.
enable_nfs13falseProvisions NFS storage. Not typically required for Metabase.
nfs_mount_path13'/mnt/nfs'NFS mount path.
nfs_volume_name13'nfs-data-volume'Volume name for the NFS mount.
nfs_instance_name13""Name of an existing NFS GCE VM.
nfs_instance_base_name13'app-nfs'Base name for inline NFS VM.

4. Advanced Security

A. Identity-Aware Proxy (IAP)

IAP is particularly valuable for Metabase GKE deployments — it restricts access to the BI tool to authenticated Google users before traffic reaches the Kubernetes Service.

VariableGroupDefaultDescription
enable_iap20falseEnables IAP for the GKE Ingress.
iap_authorized_users20[]Users granted IAP access.
iap_authorized_groups20[]Google Groups granted IAP access.
iap_oauth_client_id20""OAuth 2.0 Client ID. Required when enable_iap = true. Sensitive.
iap_oauth_client_secret20""OAuth 2.0 Client Secret. Required when enable_iap = true. Sensitive.
iap_support_email20""Support email shown on the OAuth consent screen.

B. Cloud Armor

VariableGroupDefaultDescription
enable_cloud_armor21falseAttaches a Cloud Armor security policy to the GKE Ingress backend.
admin_ip_ranges21[]Admin CIDR ranges.
cloud_armor_policy_name21'default-waf-policy'Cloud Armor security policy name.
enable_cdn21falseEnables Cloud CDN on the GKE Ingress backend.

C. VPC Service Controls

VariableGroupDefaultDescription
enable_vpc_sc22falseEnables VPC-SC perimeter enforcement.
vpc_cidr_ranges22[]VPC subnet CIDR ranges.
vpc_sc_dry_run22trueLogs violations without blocking.
organization_id22""GCP Organization ID for VPC-SC.
enable_audit_logging22falseEnables detailed Cloud Audit Logs.

5. Traffic & Ingress

VariableGroupDefaultDescription
enable_custom_domain19falseProvisions a Kubernetes Ingress for custom domain routing.
application_domains19[]Custom domain names for the Ingress.
reserve_static_ip19trueProvisions a global static external IP. Recommended for production.
static_ip_name19""Name for the static IP. Leave empty to auto-generate.
network_tags19['nfsserver']Network tags applied to GKE nodes.
network_name19""VPC network name. Leave empty to auto-discover.

6. CI/CD & Delivery

VariableGroupDefaultDescription
enable_cicd_trigger12falseProvisions a Cloud Build GitHub trigger.
github_repository_url12""Full HTTPS URL of the GitHub repository.
github_token12""GitHub PAT. Sensitive.
github_app_installation_id12""GitHub App installation ID.
cicd_trigger_config12{ branch_pattern = "^main$" }Advanced Cloud Build trigger config.
enable_cloud_deploy12falseProvisions a Cloud Deploy pipeline.
cloud_deploy_stages12[dev, staging, prod(approval)]Ordered promotion stages.
enable_binary_authorization12falseEnforces image attestation.

7. Reliability & Scheduling

A. Health Probes

VariableGroupDefaultDescription
startup_probe_config10{ path="/api/health", initial_delay_seconds=60, failure_threshold=18 }GKE startup probe. Allows up to 240 seconds total startup tolerance for JVM.
health_check_config10{ path="/api/health", initial_delay_seconds=120, failure_threshold=3 }GKE liveness probe.
uptime_check_config10{ enabled=false, path="/api/health" }Cloud Monitoring uptime check.
alert_policies10[]Cloud Monitoring metric alert policies.
startup_probe10{ path="/api/health", initial_delay_seconds=60, failure_threshold=18 }Probe config passed to Metabase Common.
liveness_probe10{ path="/api/health", initial_delay_seconds=120, failure_threshold=3 }Probe config passed to Metabase Common.

B. Reliability Policies

VariableGroupDefaultDescription
enable_pod_disruption_budget9trueCreates a Kubernetes PodDisruptionBudget.
pdb_min_available9'1'Minimum pods available during voluntary disruptions.
enable_topology_spread9falseDistributes pods across GKE node zones.
topology_spread_strict9falseRejects pods if topology spread cannot be satisfied.

C. Resource Quotas

VariableGroupDefaultDescription
enable_resource_quota8falseCreates a Kubernetes ResourceQuota.
quota_cpu_requests8""Total CPU requests allowed.
quota_cpu_limits8""Total CPU limits allowed.
quota_memory_requests8""Total memory requests. Must use binary unit suffixes (e.g., '4Gi').
quota_memory_limits8""Total memory limits. Must use binary unit suffixes (e.g., '8Gi').
quota_max_pods8""Maximum pods in the namespace. Not referenced.
quota_max_services8""Maximum Services in the namespace. Not referenced.
quota_max_pvcs8""Maximum PVCs in the namespace. Not referenced.

D. Jobs & Scheduled Tasks

VariableGroupDefaultDescription
initialization_jobs11[]Kubernetes Jobs run before the application starts. Leave empty for the default db-init job.
cron_jobs11[]Scheduled cluster tasks using Kubernetes CronJobs.
additional_services11[]Sidecar or helper GKE services.

E. Backup

VariableGroupDefaultDescription
backup_schedule17'0 2 * * *'Backup cron schedule.
backup_retention_days177Days to retain backup files.
enable_backup_import17falseTriggers a one-time database import.
backup_source17'gcs''gcs' or 'gdrive'.
backup_uri17""Full GCS URI or Google Drive file ID.
backup_format17'sql'Backup file format.

8. Integrations

A. Redis

Metabase does not natively use Redis. The enable_redis variable injects REDIS_HOST and REDIS_PORT environment variables but does not configure Metabase's operation.

VariableGroupDefaultDescription
enable_redis21falseInjects Redis environment variables. Not required by Metabase.
redis_host21""Redis hostname or IP.
redis_port21'6379'Redis TCP port.

B. Custom SQL Scripts

VariableGroupDefaultDescription
enable_custom_sql_scripts18falseRuns SQL scripts from GCS against the Metabase database.
custom_sql_scripts_bucket18""GCS bucket containing SQL scripts.
custom_sql_scripts_path18""Path prefix within the bucket.
custom_sql_scripts_use_root18falseRun scripts as the root DB user.

9. Platform-Managed Behaviours

BehaviourDetail
PostgreSQL 15 requiredMetabase uses PostgreSQL as its application database. All state is stored in PostgreSQL.
MB_JETTY_PORT = "3000" injectedSet automatically by Metabase Common. Must match container_port.
JAVA_TIMEZONE = "UTC" injectedSet automatically by Metabase Common. Ensures consistent timestamp handling.
Default db-init Kubernetes JobMetabase Common provides a db-init job that runs before the workload. Override by setting initialization_jobs.
Metabase is statelessUse workload_type = 'Deployment'. StatefulSet is available but not recommended.
Session affinity = ClientIPDefault sticky sessions prevent users from being re-routed mid-session when HPA scales pods.
No application secrets generatedMetabase manages its own internal keys. No SECRET_KEY equivalent is created.
No default GCS storageAll Metabase state is in PostgreSQL. The storage_buckets default is empty.
Unix socket by defaultenable_cloudsql_volume = true. Connects to Cloud SQL via Auth Proxy Unix socket.

10. Outputs

OutputDescription
service_nameName of the Kubernetes Service.
external_ipExternal load balancer IP address.
namespaceKubernetes namespace for the deployment.
project_idGCP project ID.
deployment_idDeployment ID suffix used in resource names.
database_instance_nameName of the Cloud SQL PostgreSQL instance.
database_nameName of the application database.
database_userName of the application database user.
database_password_secretSecret Manager secret name for the database password.
container_imageContainer image used for the deployment.

Configuration Pitfalls & Sensible Defaults

Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).

VariableSensible DefaultRiskConsequence of Incorrect Value
container_resources.memory_limit"4Gi"CriticalMetabase runs on the JVM. Under 2 Gi the process crashes with OutOfMemoryError on startup. Minimum safe value is "2Gi"; "4Gi" is recommended for production.
container_resources.cpu_limit"2000m"HighJVM JIT compilation during startup requires significant CPU. Under 500m startup can exceed the probe failure_threshold, causing perpetual container restarts.
container_resources.mem_requestnull (defaults to limit)HighOn GKE Autopilot, mem_request drives node provisioning. Setting it far below memory_limit causes GKE to schedule the pod on a node with insufficient physical memory, resulting in OOM eviction.
MB_JAVA_OPTS (via environment_variables)Not setHighAlways pair -Xmx with a value below memory_limit (e.g., -Xmx3500m when limit is "4Gi"). A heap ceiling exceeding container memory causes OOM kills.
MB_JETTY_PORT"3000" (hardcoded in Common)HighOverriding without also changing container_port breaks all routing and health checks.
application_database_name"metabase"HighImmutable after first apply. Changing orphans the entire Metabase schema.
application_database_user"metabase"HighImmutable after first apply. Renaming requires manual Cloud SQL intervention.
application_version"v0.51.3"HighMetabase migrations are one-way. Downgrading to a previous version after a migration has run corrupts the application schema. Always stage upgrades.
startup_probe_config.initial_delay_seconds60HighMetabase JVM startup + DB migration check takes 60–90 s. Reducing below 30 causes premature pod kills before the app is ready.
startup_probe_config.failure_threshold30 (= 300 s)HighReducing causes premature container kills before Metabase completes JVM startup. Do not reduce below 20.
min_instance_count1HighScale-to-zero causes 60–90 s cold starts (JVM + DB migration check). This triggers request timeouts for users and alert delivery failures.
enable_cloudsql_volumetrueCriticalRequired for the Cloud SQL Auth Proxy sidecar. Disabling with a PostgreSQL backend causes all DB connections to be refused.
quota_memory_requests / quota_memory_limits"4Gi" / "8Gi"HighGKE-specific: must use binary suffixes (Gi, Mi). A bare integer (e.g., "4") is treated as bytes and blocks all pod scheduling.
stateful_pvc_enabledfalseMediumMetabase does not require persistent volumes — all state is in PostgreSQL. Enabling may introduce stuck rollouts for a service that does not need it.
pdb_min_available"1"MediumSetting to "0" allows all pods to be evicted during node upgrades, causing a full Metabase outage.
enable_iapfalseHighWithout IAP, the Metabase login page is accessible from the load-balancer IP. Enable IAP or configure Kubernetes network policies for internal-only deployments.
backup_schedule"0 2 * * *"MediumDisabling automated backups means all saved dashboards, questions, and user definitions can be permanently lost.
JAVA_TIMEZONE"UTC" (hardcoded in Common)MediumOverriding causes Metabase date filtering and report scheduling to use a different timezone than the database, producing incorrect results.