Skip to main content

Ollama Common Shared Configuration Module

The Ollama Common module defines the Ollama LLM inference server configuration for the RAD Modules ecosystem. It creates one GCP resource (the GCS models bucket, via the storage_buckets output consumed by the caller) and produces a config output consumed by platform-specific wrapper modules (Ollama CloudRun and Ollama GKE).

1. Overview

Purpose: To centralize all Ollama-specific configuration — including container image, port, GCS model persistence, automatically injected environment variables, and the optional model-pull initialization job — in a single module shared by both Cloud Run and GKE deployments.

Architecture:

Layer 3: Application Wrappers
├── Ollama_CloudRun ──┐
└── Ollama_GKE ──┤── instantiate Ollama_Common

Ollama_Common (this module)
Creates: 1 GCS bucket (models)
Produces: config, storage_buckets, secret_ids (empty),
secret_values (empty), path

Layer 2: Platform Modules
├── App_CloudRun (serverless deployment)
└── App_GKE (Kubernetes deployment)

Layer 1: App_Common (networking, storage, secrets, IAM)

Key characteristics:

  • Unlike most *_Common modules, Ollama Common creates no secrets — Ollama requires no database credentials, API keys, or passwords. Both secret_ids and secret_values output empty maps.
  • The ollama-models GCS bucket is always appended to the gcs_volumes list, ensuring the models directory is mounted at /mnt/gcs in the container regardless of what additional volumes the caller provides.
  • When default_model is set and initialization_jobs is empty, the module auto-generates a model-pull initialization job using scripts/model-pull.sh.
  • Supports a container_resources override object; when null, falls back to the top-level cpu_limit / memory_limit variables.

2. GCP Resources Created

Ollama Common itself creates no GCP resources directly. It produces a storage_buckets output that the calling wrapper module passes to App CloudRun or App GKE, which then creates the bucket.

Bucket suffixContentMount path
modelsOllama model weight files/mnt/gcs/ollama/models (via GCS Fuse)

The full bucket name is <wrapper_prefix>-models. The bucket uses STANDARD storage class, force_destroy = true, no versioning, and public_access_prevention = "inherited".


3. Outputs

config

The application configuration object passed to the platform module via application_config.

FieldValue / Description
app_nameFrom application_name (default: "ollama")
display_nameFrom application_display_name (default: "Ollama LLM Server")
descriptionFrom description
container_image"ollama/ollama"
application_versionFrom application_version (default: "latest")
image_source"prebuilt"
enable_image_mirroringFrom enable_image_mirroring (default: true)
container_build_config{ enabled=false, dockerfile_path="Dockerfile", context_path="." }
container_port11434
database_type"NONE" — no Cloud SQL instance is provisioned
db_name""
db_user""
enable_cloudsql_volumefalse
cloudsql_volume_mount_path"/cloudsql"
gcs_volumesMerged list: caller's gcs_volumes + ollama-models bucket at /mnt/gcs
container_resourcesFrom var.container_resources if non-null; else { cpu_limit, memory_limit } from top-level variables
min_instance_countFrom min_instance_count
max_instance_countFrom max_instance_count
environment_variablesFixed map — see §4
enable_postgres_extensionsfalse
postgres_extensions[]
initialization_jobsAuto-generated model-pull job or caller-supplied list — see §5
startup_probeFrom startup_probe variable
liveness_probeFrom liveness_probe variable
additional_services[] — Ollama has no companion services

storage_buckets

One entry, always included:

FieldValue
name_suffix"models"
name<wrapper_prefix>-models
locationFrom region
storage_class"STANDARD"
force_destroytrue
versioning_enabledfalse
lifecycle_rules[]
public_access_prevention"inherited"

secret_ids

Always returns an empty map ({}). Ollama requires no application-managed secrets.

secret_values

Always returns an empty sensitive map ({}).

path

Absolute path to the module directory. Used by wrapper modules to locate scripts/.


4. Automatically Injected Environment Variables

The following environment variables are always set in the container and must not be overridden by caller-supplied environment_variables (they would be silently overridden by the merge):

VariableValuePurpose
OLLAMA_MODELS"/mnt/gcs/ollama/models"Points Ollama at the GCS Fuse subdirectory where model weights are stored. This directory persists across container restarts.
OLLAMA_HOST"0.0.0.0:11434"Binds Ollama to all interfaces so Cloud Run's ingress or the Kubernetes service proxy can forward traffic to the container.
OLLAMA_KEEP_ALIVE"24h"Keeps the loaded model resident in memory for 24 hours between requests, eliminating per-request model-load latency.

Caller-supplied environment_variables are merged after these defaults, so callers can override OLLAMA_KEEP_ALIVE (e.g. to "-1" for permanent residency) or add tuning variables such as OLLAMA_NUM_PARALLEL.


5. Model-Pull Initialization Job

Ollama Common implements a two-path initialization job strategy:

Path 1 — Custom jobs provided (initialization_jobs is non-empty): The caller's jobs are used verbatim. The auto-generated model-pull job is not created.

Path 2 — Auto-generated job (initialization_jobs = [] and default_model is non-empty): A single model-pull job is created with the following configuration:

FieldValue
name"model-pull"
description"Pull <default_model> into the GCS models bucket"
imagenull (uses the main Ollama service image)
command[]
args[]
env_vars{ OLLAMA_MODELS="/mnt/gcs/ollama/models", OLLAMA_HOST="0.0.0.0:11434", OLLAMA_MODEL=<default_model> }
cpu_limitFrom cpu_limit variable
memory_limitFrom memory_limit variable
timeout_secondsFrom model_pull_timeout_seconds variable
max_retries2
task_count1
execution_mode"TASK"
mount_nfsfalse
mount_gcs_volumes["ollama-models"]
depends_on_jobs[]
execute_on_applytrue
script_path<module_path>/scripts/model-pull.sh

Path 3 — Skip job (initialization_jobs = [] and default_model = ""): An empty job list is produced. No initialization job is created.


6. Scripts

All supporting scripts are in scripts/. The wrapper modules set scripts_dir to this directory.

Dockerfile

A minimal extension of the upstream Ollama image:

FROM ollama/ollama:latest
EXPOSE 11434
ENTRYPOINT ["/bin/ollama"]
CMD ["serve"]

This Dockerfile is used when container_image_source = "custom" in the wrapper module. For the default "prebuilt" source, the upstream ollama/ollama image is used directly and the Dockerfile is not built.

model-pull.sh

Pulls a named model into the GCS-backed models directory. Logic:

  1. Checks that OLLAMA_MODEL is set; exits cleanly if not.
  2. Creates $OLLAMA_MODELS directory if absent.
  3. Starts ollama serve in the background.
  4. Polls http://localhost:11434/ up to 30 times (3-second interval) until the server is ready.
  5. Runs ollama pull $OLLAMA_MODEL.
  6. Kills the background server and waits for clean shutdown.

The script uses set -euo pipefail and exits with a non-zero code if the server fails to start within the retry window.


7. Input Variables

Project & Identity

VariableTypeDefaultDescription
project_idstringrequiredGCP project ID.
wrapper_prefixstringrequiredPrefix for GCS bucket names. Must match the resource_prefix used by the calling App CloudRun or App GKE module.
deployment_idstring""Unique deployment identifier.
common_labelsmap(string){}Labels applied to resources created by this module.
regionstring"us-central1"Region for the GCS models bucket.

Application Details

VariableTypeDefaultDescription
application_namestring"ollama"Application name used in resource naming.
application_display_namestring"Ollama LLM Server"Human-readable application name.
descriptionstring"Ollama — standalone open-source LLM inference server..."Application description.
application_versionstring"latest"Ollama Docker image tag.

Model Configuration

VariableTypeDefaultDescription
default_modelstring""Ollama model to pull during first deployment (e.g. "llama3.2:3b", "mistral", "phi3:mini"). Leave empty to skip the model-pull initialization job. Models are stored in GCS and persist across container restarts.
model_pull_timeout_secondsnumber3600Timeout in seconds for the model-pull initialization job. Valid range: 300–7200.

Resources

VariableTypeDefaultDescription
cpu_limitstring"8000m"CPU limit for the Ollama container. Used for both the main container and the model-pull job. 7B models need at least 6 vCPU for tolerable latency; 3B models work at 4 vCPU. Note: this is the Common module internal default; the CloudRun and GKE wrapper modules have their own defaults ("4000m" and "8" respectively).
memory_limitstring"16Gi"Memory limit for the Ollama container. 3B models need ~4 Gi; 7B models need ~8–16 Gi.
container_resourcesanynullFull container resources override. When non-null, takes precedence over cpu_limit and memory_limit.
min_instance_countnumber1Minimum instances. Set to 1 to keep a warm instance for low-latency inference.
max_instance_countnumber3Maximum instances.

Storage

VariableTypeDefaultDescription
gcs_volumeslist(any)[]Additional GCS volume mounts. The ollama-models bucket at /mnt/gcs is always appended automatically.

Environment & Probes

VariableTypeDefaultDescription
environment_variablesmap(string){}Additional environment variables merged into the container spec after the fixed Ollama variables.
initialization_jobslist(any)[]Custom initialization jobs. When non-empty, overrides the default model-pull job entirely.
startup_probeobject{ enabled=true, type="HTTP", path="/", initial_delay_seconds=30, timeout_seconds=5, period_seconds=15, failure_threshold=20 }Startup probe. The 20-attempt threshold allows up to ~5 minutes for model loading from GCS.
liveness_probeobject{ enabled=true, type="HTTP", path="/", initial_delay_seconds=60, timeout_seconds=5, period_seconds=30, failure_threshold=3 }Liveness probe. 60 s initial delay avoids false restarts during model-load phase.
enable_image_mirroringbooltrueMirror the Ollama image to Artifact Registry before deployment.

8. GCS Volume Layout

The <wrapper_prefix>-models GCS bucket is mounted at /mnt/gcs in the container:

<wrapper_prefix>-models/       ← GCS bucket root
└── ollama/
└── models/ ← /mnt/gcs/ollama/models (OLLAMA_MODELS)
├── llama3.2:3b/ ← example model directory
├── mistral/
└── ...

The gcs_volumes entry appended by this module:

{
name = "ollama-models"
bucket_name = "<wrapper_prefix>-models"
mount_path = "/mnt/gcs"
read_only = false
}

9. Platform-Specific Differences

AspectOllama CloudRunOllama GKE
regionHard-coded to "us-central1" in main.tfAuto-discovered from VPC subnets via app_networking; falls back to var.region
Model-pull job typeCloud Run JobKubernetes Job
mount_gcs_volumes["ollama-models"] mounted in the Cloud Run Job["ollama-models"] mounted in the Kubernetes Job
GCS Fuse driverGCS Fuse (Cloud Run gen2)GCS Fuse CSI driver (GKE)
Additional servicesnoneadditional_services = [] (can be extended by caller)
Service networkingDirect VPC Egress (Cloud Run)ClusterIP service (Kubernetes)
ScalingServerless instancesKubernetes pods with HPA

10. Implementation Pattern

# Example: how Ollama_CloudRun instantiates Ollama_Common

module "ollama_app" {
source = "../Ollama_Common"

project_id = var.project_id
deployment_id = local.random_id
common_labels = local.common_labels

wrapper_prefix = local.wrapper_prefix
region = "us-central1"

application_name = var.application_name
application_display_name = var.application_display_name
description = var.description
application_version = var.application_version

default_model = var.default_model
model_pull_timeout_seconds = var.model_pull_timeout_seconds

cpu_limit = var.cpu_limit
memory_limit = var.memory_limit
min_instance_count = var.min_instance_count
max_instance_count = var.max_instance_count

gcs_volumes = var.gcs_volumes
environment_variables = var.environment_variables
initialization_jobs = var.initialization_jobs
startup_probe = var.startup_probe
liveness_probe = var.liveness_probe
enable_image_mirroring = var.enable_image_mirroring
}

# config is passed to App_CloudRun
module "app_cloudrun" {
source = "../App_CloudRun"

application_config = { ollama = module.ollama_app.config }
module_env_vars = {}
module_secret_env_vars = {}
module_storage_buckets = module.ollama_app.storage_buckets
scripts_dir = abspath("${module.ollama_app.path}/scripts")
# ... other inputs
}