Crawl4AI on Google Cloud Run
This document provides a comprehensive reference for the modules/Crawl4AI_CloudRun Terraform module. It covers architecture, IAM, configuration variables, Crawl4AI-specific behaviours, and operational patterns for deploying Crawl4AI on Google Cloud Run (v2).
1. Module Overview
Crawl4AI is an open-source LLM-friendly web crawler and scraper with 40,000+ GitHub stars. It enables AI teams to rapidly ingest web content for RAG pipelines, knowledge bases, and monitoring without building custom extraction infrastructure. Crawl4AI CloudRun is a wrapper module built on top of App CloudRun. It uses App CloudRun for all GCP infrastructure provisioning and injects Crawl4AI-specific application configuration via Crawl4AI Common.
Key Capabilities:
- Compute: Cloud Run v2 (Gen2 required), Python container, 4 vCPU / 8 Gi default. Supervisord manages two processes inside the container: embedded Redis (task queue, port 6379) and Gunicorn ASGI server (port 11235). Chromium/Playwright handles browser-based crawling.
- Data Persistence: Stateless — no external database is provisioned (
database_type = "NONE"). Redis runs inside the container and does not persist between restarts. - Security: Inherits Cloud Armor WAF, IAP, Binary Authorization, and VPC Service Controls from
App CloudRun. No application secrets are auto-generated byCrawl4AI Common. - AI Integration: Supports LLM-based extraction via OpenAI, Anthropic, DeepSeek, Groq, Gemini, and custom providers. API keys are injected via
secret_environment_variables. - CI/CD: Uses the prebuilt
unclecode/crawl4aiimage by default; Cloud Build custom image pipeline optional.
Project & Application Identity
| Variable | Group | Type | Default | Description |
|---|---|---|---|---|
project_id | 1 | string | — | GCP project ID. Required. |
tenant_deployment_id | 2 | string | 'demo' | Short suffix appended to all resource names. |
support_users | 2 | list(string) | [] | Email recipients for monitoring alerts. |
resource_labels | 2 | map(string) | {} | Labels applied to all provisioned resources. |
application_name | 3 | string | 'crawl4ai' | Base resource name. |
application_display_name | 3 | string | 'Crawl4AI Web Crawler' | Human-readable name shown in the GCP Console. |
description | 3 | string | (Crawl4AI description) | Cloud Run service description. |
application_version | 3 | string | 'latest' | Docker image tag. Use a pinned tag (e.g., '0.6.0') for production. |
Wrapper architecture: Crawl4AI CloudRun calls Crawl4AI Common to build a config object, then passes it as application_modules.crawl4ai to App CloudRun. No additional services are deployed alongside the main container.
Stateless design: Crawl4AI has no external database dependency. The embedded Redis instance stores task results in-memory with a configurable TTL (redis_task_ttl_seconds). Task results are lost on container restart — this is the expected behaviour for an ephemeral crawl API.
2. IAM & Access Control
Crawl4AI_CloudRun delegates all IAM provisioning to App_CloudRun. The Cloud Run SA, Cloud Build SA, and IAP service agent role sets are identical to those in App_CloudRun §2.
No auto-generated secrets: Crawl4AI Common creates no Secret Manager secrets. The SECRET_KEY for JWT authentication (if enabled) must be provided via secret_environment_variables:
secret_environment_variables = {
SECRET_KEY = "crawl4ai-jwt-secret"
}
JWT authentication: Security is disabled by default. To enable JWT authentication, set SECRET_KEY to a secure random value via secret_environment_variables AND set security.jwt_enabled=true in a custom config.yml. The /token endpoint then requires the security.api_token (set in config.yml) to issue short-lived JWTs.
3. Core Service Configuration
A. Compute (Cloud Run)
Crawl4AI requires Gen2 execution environment — supervisord needs a full Linux process tree, and Chromium requires /tmp for shared memory (via --disable-dev-shm-usage). The effective memory constraint is total container memory.
Container image: Crawl4AI_Common sets image_source = "prebuilt" and container_image = "unclecode/crawl4ai". Image mirroring is enabled by default to avoid Docker Hub rate limits.
| Variable | Group | Default | Description |
|---|---|---|---|
deploy_application | 4 | true | Set false for IAM provisioning only without deploying the container. |
cpu_limit | 4 | '4000m' | CPU per instance. Size to match expected browser concurrency (~0.5–1 vCPU per active context). |
memory_limit | 4 | '8Gi' | Memory per instance. Minimum 4 Gi; 8 Gi recommended for multiple concurrent crawls. |
min_instance_count | 4 | 1 | Minimum instances. Set to 1 for a warm Chromium pool. Set to 0 for cost savings at the cost of 30–60 s cold starts. |
max_instance_count | 4 | 3 | Maximum concurrently running instances. Range: 1–1000. |
execution_environment | 4 | 'gen2' | Required. Gen2 for supervisord process tree and Chromium /tmp memory. |
timeout_seconds | 4 | 3600 | Maximum request duration. Set to 3600 (Cloud Run maximum) to allow long batch crawl jobs. |
container_protocol | 4 | 'http1' | HTTP protocol version. 'http1' or 'h2c'. |
enable_image_mirroring | 4 | true | Mirrors the Crawl4AI image to Artifact Registry to avoid Docker Hub rate limits. |
traffic_split | 4 | [] | Percentage-based canary/blue-green traffic allocation. |
service_annotations | 4 | {} | Advanced Cloud Run annotations. |
service_labels | 4 | {} | Labels applied to the Cloud Run service. |
Key differences from App CloudRun defaults:
| Variable | App CloudRun | Crawl4AI CloudRun | Reason |
|---|---|---|---|
container_port | 8080 | 11235 | Crawl4AI REST API listens on 11235. |
cpu_limit | '1000m' | '4000m' | Each Chromium browser context consumes ~0.5–1 vCPU. |
memory_limit | '512Mi' | '8Gi' | Chromium requires significant memory; --disable-dev-shm-usage redirects shared memory to /tmp. |
execution_environment | 'gen2' | 'gen2' (required) | supervisord and Chromium require Gen2. |
timeout_seconds | 300 | 3600 | Long batch crawls can take up to 30 minutes. |
vpc_egress_setting | 'PRIVATE_RANGES_ONLY' | 'ALL_TRAFFIC' | The crawler must reach arbitrary public URLs on the internet. |
database_type | (varies) | 'NONE' | Crawl4AI has no database dependency. |
B. Crawl4AI-Specific Configuration
| Variable | Group | Default | Description |
|---|---|---|---|
redis_task_ttl_seconds | 19 | 3600 | TTL in seconds for task results in embedded Redis. Prevents unbounded memory growth. Range: 300–86400. |
C. Networking
| Variable | Group | Default | Description |
|---|---|---|---|
ingress_settings | 5 | 'all' | Traffic sources permitted to reach the Cloud Run service. Use 'all' for public API access. |
vpc_egress_setting | 5 | 'ALL_TRAFFIC' | Important: Use 'ALL_TRAFFIC' so the crawler can reach arbitrary public URLs. |
enable_iap | 5 | false | Enable IAP for Google identity authentication. |
iap_authorized_users | 5 | [] | Users/service accounts granted access through IAP. |
iap_authorized_groups | 5 | [] | Google Groups granted access through IAP. |
enable_vpc_sc | 5 | false | Enable VPC Service Controls perimeter enforcement. |
vpc_cidr_ranges | 5 | [] | VPC subnet CIDR ranges for VPC-SC network access level. |
vpc_sc_dry_run | 5 | true | Log VPC-SC violations without blocking. |
organization_id | 5 | "" | GCP Organization ID for VPC-SC policy. |
enable_audit_logging | 5 | false | Enable detailed Cloud Audit Logs. |
admin_ip_ranges | 5 | [] | CIDR ranges permitted for administrative access. |
enable_cloud_armor | 5 | false | Enable Cloud Armor WAF fronted by a Global HTTPS Load Balancer. |
application_domains | 5 | [] | Custom domain names for the Cloud Armor Load Balancer. |
enable_cdn | 5 | false | Enable Cloud CDN on the Global HTTPS Load Balancer. |
D. Environment Variables & LLM Integration
Crawl4AI supports LLM-based extraction via environment variable configuration:
| Variable | Group | Default | Description |
|---|---|---|---|
environment_variables | 6 | {} | Additional plain-text environment variables. PYTHONUNBUFFERED and REDIS_TASK_TTL are set automatically. Do NOT set REDIS_HOST or REDIS_PORT — Redis runs inside the container. Recognised overrides: LLM_PROVIDER, LLM_BASE_URL, LLM_TEMPERATURE, CRAWL4AI_HOOKS_ENABLED. |
secret_environment_variables | 6 | {} | Secret Manager secret references injected as environment variables. Use for: SECRET_KEY (JWT signing), OPENAI_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY, GROQ_API_KEY, GEMINI_API_KEY, LLM_API_KEY. |
secret_propagation_delay | 6 | 30 | Seconds to wait after secret creation. Range: 0–300. |
secret_rotation_period | 6 | '2592000s' | Rotation period for Secret Manager secrets. Default: 30 days. |
Warning:
CRAWL4AI_HOOKS_ENABLED=trueenables arbitrary Python code execution via webhook hooks. Only enable in a fully trusted environment.
4. Advanced Security
A. Cloud Armor WAF
| Variable | Group | Default | Description |
|---|---|---|---|
enable_cloud_armor | 5 | false | Provisions Global HTTPS LB + Cloud Armor WAF. Required for custom domains and DDoS protection. |
admin_ip_ranges | 5 | [] | CIDR ranges exempted from WAF rules. |
B. Identity-Aware Proxy (IAP)
When enable_iap = true, Cloud Run's native IAP integration is enabled. All requests require a valid Google identity before reaching Crawl4AI.
C. Binary Authorization
| Variable | Group | Default | Description |
|---|---|---|---|
enable_binary_authorization | 8 | false | Enforce Binary Authorization requiring signed container images. |
5. Storage & Filesystem
Crawl4AI is stateless by design. Storage resources are optional:
| Variable | Group | Default | Description |
|---|---|---|---|
create_cloud_storage | 11 | true | Provision GCS buckets defined in storage_buckets. |
storage_buckets | 11 | [] | Additional GCS buckets to provision (e.g., for crawl result caching). No buckets by default. |
enable_nfs | 11 | false | Provision and mount a Cloud Filestore NFS volume. Not needed for standard Crawl4AI deployments. |
nfs_mount_path | 11 | '/mnt/nfs' | Filesystem path where the NFS volume is mounted. |
nfs_instance_name | 11 | "" | Name of an existing NFS GCE VM. Leave empty to auto-discover. |
nfs_instance_base_name | 11 | 'app-nfs' | Base name for the inline NFS GCE VM. |
gcs_volumes | 11 | [] | GCS buckets to mount as filesystem volumes via GCS Fuse. |
manage_storage_kms_iam | 11 | false | Create a CMEK KMS key for GCS encryption. |
enable_artifact_registry_cmek | 11 | false | Enable CMEK encryption for Artifact Registry. |
6. CI/CD & Delivery
| Variable | Group | Default | Description |
|---|---|---|---|
enable_cicd_trigger | 8 | false | Enable a Cloud Build trigger for automated builds on GitHub pushes. |
github_repository_url | 8 | "" | Full HTTPS URL of the GitHub repository. |
github_token | 8 | "" | GitHub PAT for Cloud Build authentication. Sensitive. |
github_app_installation_id | 8 | "" | Cloud Build GitHub App installation ID. |
cicd_trigger_config | 8 | { branch_pattern = "^main$" } | Advanced Cloud Build trigger configuration. |
enable_cloud_deploy | 8 | false | Switch CI/CD to a managed Cloud Deploy pipeline with promotion stages. |
cloud_deploy_stages | 8 | [dev, staging, prod(approval)] | Ordered promotion stages for the Cloud Deploy pipeline. |
7. Reliability & Scheduling
A. Health Probes
Crawl4AI exposes a /health HTTP endpoint. Supervisord boots Redis (priority 10) then Gunicorn (priority 20) before /health responds — allow at least 40 seconds of initial delay (matches the docker-compose start_period).
| Variable | Group | Default | Description |
|---|---|---|---|
startup_probe | 14 | { path="/health", initial_delay_seconds=40, period_seconds=10, failure_threshold=12, ... } | Startup probe. 40 s initial delay for supervisord + Playwright/Chromium initialisation. |
liveness_probe | 14 | { path="/health", initial_delay_seconds=60, period_seconds=30, failure_threshold=3, ... } | Liveness probe. |
startup_probe_config | 14 | { enabled=true, path="/health" } | Alternative startup probe for App CloudRun. Takes precedence when both are set. |
health_check_config | 14 | { enabled=true, path="/health" } | Alternative liveness probe for App CloudRun. Takes precedence when both are set. |
uptime_check_config | 14 | { enabled=true, path="/health" } | Google Cloud Monitoring uptime check configuration. |
alert_policies | 14 | [] | Cloud Monitoring alert policies. |
max_images_to_retain | 14 | 7 | Maximum number of container images to keep in Artifact Registry. |
delete_untagged_images | 14 | true | Automatically delete untagged container images from Artifact Registry. |
image_retention_days | 14 | 30 | Days after which container images are eligible for deletion. |
B. Initialization & Cron Jobs
| Variable | Group | Default | Description |
|---|---|---|---|
initialization_jobs | 9 | [] | Cloud Run jobs to execute once during or after deployment. Defaults to CPU 4000m / memory 8Gi to match the service container. |
cron_jobs | 9 | [] | Recurring scheduled tasks deployed as Cloud Run jobs triggered by Cloud Scheduler. |
8. Platform-Managed Behaviours
| Behaviour | Implementation | Detail |
|---|---|---|
| No database provisioned | database_type = "NONE" in Crawl4AI Common | Crawl4AI has no external database dependency. Cloud SQL is not provisioned. |
| Embedded Redis | Supervisord starts Redis (priority 10) inside the container | Task results stored in-memory. Lost on restart. Do not override REDIS_HOST or REDIS_PORT. |
| Gen2 required | execution_environment = "gen2" | supervisord requires a full Linux process tree; Chromium uses /tmp for shared memory. |
| ALL_TRAFFIC egress | vpc_egress_setting = "ALL_TRAFFIC" | The crawler must reach arbitrary public URLs on the internet. |
| REDIS_TASK_TTL injected | REDIS_TASK_TTL = tostring(var.redis_task_ttl_seconds) | Prevents unbounded Redis memory growth from accumulated task results. |
| PYTHONUNBUFFERED=1 | Injected by Crawl4AI Common | Ensures Python output is not buffered — important for log streaming. |
| No auto-generated secrets | secret_ids = {} from Crawl4AI Common | No secrets are created by default. Inject SECRET_KEY via secret_environment_variables to enable JWT auth. |
| Prebuilt image by default | image_source = "prebuilt" | Uses unclecode/crawl4ai:<version> directly. Image mirroring copies it to Artifact Registry. |
9. Variable Reference
| Variable | Group | Default | Description |
|---|---|---|---|
module_description | 0 | (Crawl4AI platform text) | Platform metadata: module description. |
module_documentation | 0 | 'https://docs.radmodules.dev/docs/modules/Crawl4AI_CloudRun' | Platform metadata: documentation URL. |
module_dependency | 0 | ['Services GCP'] | Platform metadata: required modules. |
module_services | 0 | (GCP service list) | Platform metadata: GCP services consumed. |
credit_cost | 0 | 50 | Platform metadata: deployment credit cost. |
require_credit_purchases | 0 | false | Platform metadata: enforces credit balance check. |
enable_purge | 0 | true | Permits full deletion of module resources on destroy. |
public_access | 0 | false | Platform catalogue visibility. |
shared_users | 0 | [] | Users who can access this module regardless of public_access. Enforced by the platform. |
deployment_id | 0 | "" | Deployment ID suffix. Auto-generated if empty. |
resource_creator_identity | 0 | (platform SA) | Service account used by Terraform to manage resources. |
impersonation_service_account | 0 | "" | SA to impersonate for shell script API calls. |
project_id | 1 | — | GCP project ID. Required. |
region | 1 | 'us-central1' | GCP region fallback. |
tenant_deployment_id | 2 | 'demo' | Short suffix appended to all resource names. |
support_users | 2 | [] | Email addresses for monitoring alerts. |
resource_labels | 2 | {} | Labels applied to all provisioned resources. |
application_name | 3 | 'crawl4ai' | Base resource name. |
application_display_name | 3 | 'Crawl4AI Web Crawler' | Human-readable name. |
description | 3 | (Crawl4AI description) | Service description. |
application_version | 3 | 'latest' | Docker image tag. Use a pinned tag for production. |
redis_task_ttl_seconds | 19 | 3600 | TTL for embedded Redis task results. Range: 300–86400. |
deploy_application | 4 | true | Set false for IAM-only deployment. |
cpu_limit | 4 | '4000m' | CPU per instance. |
memory_limit | 4 | '8Gi' | Memory per instance. Minimum 4 Gi. |
min_instance_count | 4 | 1 | Minimum instances. |
max_instance_count | 4 | 3 | Maximum instances. Range: 1–1000. |
execution_environment | 4 | 'gen2' | Required. Gen2 for supervisord and Chromium. |
timeout_seconds | 4 | 3600 | Max request duration. Range: 0–3600. |
container_protocol | 4 | 'http1' | 'http1' or 'h2c'. |
enable_image_mirroring | 4 | true | Mirrors image into Artifact Registry. |
traffic_split | 4 | [] | Canary/blue-green traffic allocation. |
service_annotations | 4 | {} | Advanced Cloud Run annotations. |
service_labels | 4 | {} | Labels applied to the Cloud Run service. |
ingress_settings | 5 | 'all' | 'all', 'internal', or 'internal-and-cloud-load-balancing'. |
vpc_egress_setting | 5 | 'ALL_TRAFFIC' | Use 'ALL_TRAFFIC' for internet crawling. |
enable_iap | 5 | false | Enables IAP authentication. |
iap_authorized_users | 5 | [] | IAP-authorized users/SAs. |
iap_authorized_groups | 5 | [] | IAP-authorized Google Groups. |
enable_vpc_sc | 5 | false | VPC Service Controls perimeter enforcement. |
vpc_cidr_ranges | 5 | [] | VPC subnet CIDR ranges for VPC-SC. |
vpc_sc_dry_run | 5 | true | Log-only mode for VPC-SC. |
organization_id | 5 | "" | GCP Organization ID for VPC-SC. |
enable_audit_logging | 5 | false | Enable Cloud Audit Logs. |
admin_ip_ranges | 5 | [] | Administrative CIDR ranges. |
enable_cloud_armor | 5 | false | Cloud Armor WAF + Global HTTPS LB. |
application_domains | 5 | [] | Custom domains for Cloud Armor LB. |
enable_cdn | 5 | false | Cloud CDN on the HTTPS LB backend. |
cloudsql_volume_mount_path | 5 | '/cloudsql' | Not used by Crawl4AI but required by App CloudRun interface. |
environment_variables | 6 | {} | Additional plain-text env vars. Do not set REDIS_HOST/REDIS_PORT. |
secret_environment_variables | 6 | {} | Secret Manager references. Use for SECRET_KEY, OPENAI_API_KEY, etc. |
secret_propagation_delay | 6 | 30 | Seconds to wait after secret creation. |
secret_rotation_period | 6 | '2592000s' | Secret Manager rotation notification frequency. |
backup_schedule | 7 | "" | Not applicable — Crawl4AI is stateless. |
backup_retention_days | 7 | 7 | Days to retain backup files. |
enable_backup_import | 7 | false | Not applicable for Crawl4AI. |
backup_source | 7 | 'gcs' | Backup source. |
backup_uri | 7 | "" | Backup file location. |
backup_format | 7 | 'sql' | Backup file format. |
enable_cicd_trigger | 8 | false | Cloud Build GitHub trigger. |
github_repository_url | 8 | "" | GitHub repository URL. |
github_token | 8 | "" | GitHub PAT. Sensitive. |
github_app_installation_id | 8 | "" | GitHub App installation ID. |
cicd_trigger_config | 8 | { branch_pattern = "^main$" } | Cloud Build trigger config. |
enable_cloud_deploy | 8 | false | Cloud Deploy pipeline. |
cloud_deploy_stages | 8 | [dev, staging, prod(approval)] | Cloud Deploy promotion stages. |
enable_binary_authorization | 8 | false | Enforce image attestation. |
enable_custom_sql_scripts | 9 | false | Not applicable for Crawl4AI. |
custom_sql_scripts_bucket | 9 | "" | GCS bucket for SQL scripts. |
custom_sql_scripts_path | 9 | "" | Path prefix for SQL scripts. |
custom_sql_scripts_use_root | 9 | false | Run SQL scripts as root user. |
initialization_jobs | 9 | [] | One-shot Cloud Run Jobs. Defaults to 4 vCPU / 8 Gi per job. |
cron_jobs | 9 | [] | Recurring scheduled Cloud Run Jobs. |
create_cloud_storage | 11 | true | Provision GCS buckets. |
storage_buckets | 11 | [] | Additional GCS buckets (none by default for Crawl4AI). |
enable_nfs | 11 | false | NFS volume mount. Not needed for standard Crawl4AI deployments. |
nfs_mount_path | 11 | '/mnt/nfs' | NFS mount path. |
nfs_instance_name | 11 | "" | Existing NFS GCE VM name. |
nfs_instance_base_name | 11 | 'app-nfs' | Base name for inline NFS VM. |
gcs_volumes | 11 | [] | GCS Fuse volume mounts. |
manage_storage_kms_iam | 11 | false | CMEK for GCS. |
enable_artifact_registry_cmek | 11 | false | CMEK for Artifact Registry. |
database_type | 12 | 'NONE' | No database for Crawl4AI. |
database_password_length | 12 | 32 | Not used by Crawl4AI. |
startup_probe_config | 14 | { enabled=true, path="/health" } | App CloudRun startup probe config. |
health_check_config | 14 | { enabled=true, path="/health" } | App CloudRun liveness probe config. |
uptime_check_config | 14 | { enabled=true, path="/health" } | Cloud Monitoring uptime check. |
alert_policies | 14 | [] | Cloud Monitoring alert policies. |
startup_probe | 14 | { path="/health", initial_delay_seconds=40, failure_threshold=12, ... } | Startup probe forwarded to Crawl4AI Common. |
liveness_probe | 14 | { path="/health", initial_delay_seconds=60, failure_threshold=3, ... } | Liveness probe forwarded to Crawl4AI Common. |
max_images_to_retain | 14 | 7 | Maximum container images in Artifact Registry. |
delete_untagged_images | 14 | true | Auto-delete untagged images. |
image_retention_days | 14 | 30 | Image age-based deletion threshold. |
10. Outputs
| Output | Description |
|---|---|
service_name | Name of the Cloud Run service. |
service_url | Public URL of the Crawl4AI Cloud Run service. |
service_location | GCP region where the Cloud Run service is deployed. |
project_id | GCP project ID. |
deployment_id | Deployment ID suffix used in resource names. |
container_image | Container image used for the deployment. |
cicd_enabled | Whether the CI/CD pipeline is enabled. |
Configuration Pitfalls & Sensible Defaults
Risk levels: Critical (data loss, full outage, security breach) — High (service unavailable or significant degradation) — Medium (degraded function or increased cost) — Low (minor impact).
| Variable | Sensible Default | Risk | Consequence of Incorrect Value |
|---|---|---|---|
vpc_egress_setting | "ALL_TRAFFIC" | Critical | Crawl4AI crawls arbitrary public URLs on the internet. Using "PRIVATE_RANGES_ONLY" routes only RFC-1918 traffic through the VPC and blocks all external crawl targets. All crawl jobs to public websites will fail with connection errors. ALL_TRAFFIC is required and is the correct default. |
memory_limit | "8Gi" | Critical | Crawl4AI spawns Chromium browser instances for JavaScript-rendered pages. Each concurrent browser context uses 200–500 MB. The default config allows up to 40 concurrent browser pages. Below 4Gi, Chromium processes are OOM-killed mid-crawl, returning partial or empty results. Below 2Gi, the container itself fails to start. 8Gi is the recommended minimum for production. |
cpu_limit | "4000m" | High | Chromium JavaScript rendering and DOM processing are CPU-intensive. Under 2000m, Chromium triggers internal timeouts on complex pages, and crawl times balloon. The default 4000m supports moderate concurrency; scale up for heavy parallel crawls. |
execution_environment | "gen2" | High | Crawl4AI uses Direct VPC Egress (not a VPC connector). Direct VPC Egress is only available on Gen2. Downgrading to gen1 prevents the service from deploying with VPC network configuration. |
min_instance_count | 1 | High | Crawl4AI has a significant cold start due to Chromium initialization and the embedded Redis/Supervisord stack. Scale-to-zero (0) causes the first request after a cold start to time out (30–60 seconds). Keep at 1 for responsive crawl APIs. |
max_instance_count | 3 | Medium | Each additional instance spawns its own Chromium pool and embedded Redis. At high concurrency, costs scale linearly with instance count. Set an explicit limit matching your concurrency budget. |
timeout_seconds | 3600 | Medium | Deep crawls or LLM-based extraction of large pages can take several minutes. The default 3600 seconds (1 hour) is intentionally high. Reduce for short-lived crawl APIs where zombie requests should be killed faster. |
redis_task_ttl_seconds | 3600 | Medium | Crawl4AI stores task results in its embedded Redis. Too-short TTL (< 300 s) causes completed task results to expire before clients poll for them. Too-long TTL (> 86400 s) causes unbounded memory growth from accumulated results. The valid range is 300–86400. |
LLM_API_KEY (env var) | (not set) | High | LLM-based extraction strategies (e.g., LLMExtractionStrategy) require a valid API key. Setting an invalid or expired key causes extraction jobs to fail with 401 errors from the LLM provider. Inject via environment_variables or secret_environment_variables — never hardcode in plain text. |
OPENAI_API_KEY / ANTHROPIC_API_KEY (env var) | (not set) | High | When using provider-specific extraction strategies, the corresponding API key must be present. Missing keys cause extraction to fail silently (empty or null extracted_content in results). |
container_port | 11235 | Critical | Crawl4AI's REST API listens on port 11235. Changing this without a matching UVICORN_PORT environment variable causes health checks to fail, preventing the revision from receiving traffic. |
enable_iap | false | High | The default ingress_settings = "all" exposes the crawl API publicly. Without IAP or a crawl API token, any caller can submit crawl jobs, consuming cloud resources. Enable IAP or inject CRAWL4AI_API_TOKEN via environment variables. |
application_version | "latest" | Medium | Using "latest" makes deployments non-reproducible. A rebuild may pull a new Crawl4AI version with breaking API changes. Pin to a specific version tag for production. |
enable_image_mirroring | true | Low | Crawl4AI images are large (~3–4 Gi compressed). Without mirroring to Artifact Registry, every Cloud Run deployment pulls from Docker Hub, risking rate limit failures and slow cold starts. Keep mirroring enabled. |
enable_cicd_trigger | false | Low | When enabled, ensure github_token and github_repository_url are correctly set. An invalid token silently prevents Cloud Build triggers from firing. |
Destroying Resources
When destroying a Cloud Run deployment, you may encounter a serverless IPv4 address release error. Wait 20–30 minutes after the initial destroy attempt before re-running tofu destroy.