PDE Certification Preparation Guide: Section 5 — Optimizing performance and cost (~12% of the exam)

This guide helps candidates preparing for the Google Cloud Professional Cloud DevOps Engineer (PDE) certification explore Section 5 of the exam through the lens of the Tech Equity RAD platform at https://radmodules.dev. Three modules are relevant to this section: GCP Services, which establishes the foundational shared infrastructure; App CloudRun, which deploys serverless containerised applications on Cloud Run; and App GKE, which deploys containerised workloads on GKE Autopilot.

You interact with each module by configuring its variables in the RAD UI deployment portal, then exploring the resulting infrastructure in the GCP Console. Variables are organised into numbered groups in the RAD UI deployment form — for example, "(Group 3)" refers to the third collapsible section of settings for that module. This guide maps each exam topic to the relevant variables you can configure and the console locations where you can observe the outcomes. It also highlights PDE objectives that are not currently implemented by these modules, providing guidelines for self-guided research and exploration.

5.1 Collecting performance information in Google Cloud

Cloud Run Execution Environment and CPU Allocation

Concept: Selecting the right execution environment and CPU allocation model for each workload to eliminate performance bottlenecks and ensure consistent response latency.

In the RAD UI:

Cloud Run Execution Environment: The execution_environment variable (App CloudRun module) controls whether Cloud Run uses the Gen2 execution environment (full Linux compatibility, gVisor isolation, improved CPU performance for CPU-intensive workloads and faster startup for memory-intensive workloads) or Gen1 (legacy sandbox, lower baseline cost for simple stateless workloads). Gen2 is recommended for most new workloads and is required for workloads using Unix sockets or multi-threading.
CPU Allocation (Always-On vs. Request-Only): The cpu_idle variable (App CloudRun module) controls whether the container's vCPU is throttled to near-zero between requests (cpu_idle = true, the default, which is cost-optimized but introduces latency if background threads are running) or allocated continuously while the instance is active (cpu_idle = false, required for background processing such as cache warming, periodic aggregations, or PHP OPcache generation).

Console Exploration: Navigate to Cloud Run > [service] > Revisions and click the active revision. In the Configuration tab, observe the configured execution environment (Gen1 or Gen2) and the CPU allocation setting (CPU always allocated vs. CPU only allocated during request processing). Navigate to Cloud Run > [service] > Metrics and review the CPU utilization chart — with cpu_idle = false, you will see sustained CPU utilization even between requests, reflecting background processing activity.

Real-world example: A team migrates their PHP-based API from App Engine to Cloud Run using Gen1 with cpu_idle = true. They observe intermittent p99 latency spikes on cache-cold requests — caused by OPcache regeneration competing with request serving when the CPU is unthrottled at the start of each request. Switching to execution_environment = gen2 and cpu_idle = false eliminates the throttling, allowing OPcache to warm continuously in the background. The p99 latency at cold start drops from 2.1 seconds to 340 milliseconds, with no change to the application code.

💡 Additional Performance Collection Objectives & Learning Guidelines

Cloud Profiler — Continuous Production Profiling: Research Cloud Profiler for identifying CPU-intensive code paths and memory allocation hotspots in production workloads without impacting user traffic (sampling overhead is typically less than 1% CPU). Add the Cloud Profiler agent to your Cloud Run or GKE container image (agents available for Go, Java, Node.js, Python, Ruby) and profiles are automatically uploaded to Cloud Profiler. Navigate to Profiler in the GCP Console to view flame graphs showing cumulative CPU time or heap allocation by function — enabling engineers to identify which function is responsible for 60% of CPU time without reproducing the load profile locally.
Cloud Trace — Latency Analysis and Bottleneck Identification: Research Cloud Trace for end-to-end distributed tracing across microservices. Cloud Run services automatically generate trace spans for incoming requests when the X-Cloud-Trace-Context header is present. For complete end-to-end traces, add the Cloud Trace SDK or OpenTelemetry SDK to propagate trace context through all downstream calls (Cloud SQL, Secret Manager, downstream Cloud Run services). Navigate to Trace > Trace list to filter traces by latency percentile — identify the slowest 1% of requests and inspect their waterfall diagrams to determine whether latency is concentrated in database queries, downstream API calls, or application processing.
Cloud Run Metrics for Performance Analysis: Navigate to Monitoring > Metrics Explorer and explore Cloud Run-specific performance metrics: run.googleapis.com/request_latencies (request latency histogram by revision — use ALIGN_PERCENTILE_99 to surface tail latency), run.googleapis.com/container/cpu/utilizations (CPU utilization per revision — useful for identifying CPU saturation), run.googleapis.com/container/memory/utilizations (memory utilization — identify memory pressure before OOM kills), and run.googleapis.com/container/startup_latencies (instance startup time — compare Gen1 vs Gen2 startup duration). Build a Cloud Monitoring dashboard combining these metrics to create a performance baseline for each deployed revision.
GKE Performance Diagnostics: For GKE Autopilot workloads, use the Kubernetes Engine > Workloads > [deployment] > Observability tab to view integrated CPU, memory, and network I/O metrics directly in the console without navigating to Cloud Monitoring. Research Vertical Pod Autoscaler (VPA) recommendations, which analyze actual resource usage over time and recommend adjusted CPU and memory requests — reducing both resource waste (over-requested but unused) and OOM kill risk (under-requested and killed under load). Navigate to Kubernetes Engine > Workloads and check VPA recommendation objects using Cloud Shell: kubectl get vpa -n [namespace].

5.2 Implementing FinOps practices for optimizing resource utilization and costs

Scale-to-Zero and Container Resource Rightsizing

Concept: Eliminating compute costs for idle workloads and precisely sizing resource allocations to maximize workload density and minimize waste.

In the RAD UI:

Scale-to-Zero (Cloud Run): Setting min_instance_count = 0 (App CloudRun module) enables Cloud Run's scale-to-zero behaviour — when no traffic is being served, all instances terminate and no compute charges accrue. For workloads that can tolerate cold-start latency (typically 1–3 seconds for a pre-built container), scale-to-zero can reduce Cloud Run costs by 60–90% for workloads with uneven or low traffic patterns. Setting min_instance_count = 1 prevents cold starts but incurs a continuous minimum instance cost.
Container Resource Requests and Limits (GKE): The container_resources variable (App GKE module) sets the CPU and memory requests and limits for each container. Requests determine the actual resources reserved on a GKE node for scheduling — setting requests too high wastes node capacity and inflates cluster cost; setting them too low causes CPU throttling and OOM kills. GKE Autopilot bills for the requested resources (not the node size), so precise resource requests directly control billing.

Console Exploration: Navigate to Cloud Run > [service] > Revisions and inspect the minimum and maximum instance count settings. Navigate to Cloud Run > [service] > Metrics and review the Instance count chart over time — observe the scale-to-zero behaviour (instance count dropping to 0 during quiet periods) and scale-up events (instance count rising as requests arrive). For GKE, navigate to Kubernetes Engine > Workloads > [deployment], select a pod, and review the YAML tab to confirm the resources.requests and resources.limits values. Navigate to Kubernetes Engine > Clusters > [cluster] > Observability to see cluster-wide CPU and memory utilization versus requested capacity.

Real-world example: A startup runs a data transformation Cloud Run service that processes files uploaded by customers during business hours only. With min_instance_count = 0 and max_instance_count = 10, the service runs zero instances from 22:00 to 08:00 and on weekends — eliminating compute costs for 70% of the calendar week. The team further tunes concurrency = 80 (processing up to 80 concurrent file transformation requests per instance), reducing the peak instance count from 10 to 3 for typical load. Combined, these settings reduce their Cloud Run bill by 83% compared to their previous always-on App Engine deployment.

💡 Additional FinOps Objectives & Learning Guidelines

Cloud Billing Export to BigQuery: Research how to export Cloud Billing data to BigQuery for granular cost analysis. Navigate to Billing > Billing export > BigQuery export to configure detailed usage cost export (SKU-level granularity, including per-resource labels) and pricing export (list and contract prices). Once exported, run SQL queries in BigQuery to analyze cost by project, service, label, or SKU — for example, identifying which Cloud Run service consumed the most CPU-seconds last month, or which GKE namespace generated the most egress cost. This is the foundation of any FinOps practice on Google Cloud.
Cost Breakdown and Billing Reports: Navigate to Billing > Reports to explore the built-in billing reports dashboard. Use the Group by and Filter controls to break down costs by service, project, region, SKU, and resource label. Navigate to Billing > Cost breakdown to understand the net cost after committed use discount credits and sustained use discounts are applied. Understanding the difference between list price, credits, and net cost is essential for accurate FinOps reporting and showback/chargeback to internal teams.
Recommender — Rightsizing and Idle Resource Detection: Research the Recommender service, which uses machine learning to analyze 30 days of usage data and surface actionable cost-saving recommendations. Navigate to Recommender in the console (or Active Assist in some console views). Key recommendation types to know: VM machine type rightsizing (suggests downsizing over-provisioned Compute Engine instances), idle VM recommendations (identifies VMs with less than 5% average CPU for 14 days), idle GKE cluster recommendations, and overprovisioned Cloud Run CPU recommendations. Recommendations include an estimated monthly saving and a risk level — apply low-risk recommendations immediately and investigate high-risk recommendations before acting.
Committed Use Discounts (CUDs) for GKE: Research how Committed Use Discounts provide 37% (1-year) to 55% (3-year) savings on GKE Autopilot vCPU and memory charges in exchange for a spending commitment. Unlike Compute Engine CUDs (which commit to specific machine types), GKE Autopilot CUDs commit to a resource amount (vCPU-hours and GB-hours) that can be consumed by any pod across any namespace and workload. Navigate to Billing > Commitments to explore available commitment options and their discount rates. Research the GKE usage metering feature, which attributes cluster costs to namespaces and labels — enabling commitment sizing based on actual per-team or per-application consumption.
Budget Alerts: Research how to create Cloud Billing budgets to alert when actual or forecasted spend approaches a threshold. Navigate to Billing > Budgets & alerts > Create budget. Configure a budget scoped to a specific project or service (e.g., Cloud Run only), set threshold rules (e.g., alert at 50%, 90%, and 100% of monthly budget), and connect a Pub/Sub topic to trigger automated responses (e.g., a Cloud Function that scales down non-production services when 90% of the budget is consumed). Budget alerts are reactive controls — pair them with Recommender's proactive recommendations for complete FinOps coverage.
Spot VMs for Batch GKE Workloads: Research Spot VMs for GKE node pools running fault-tolerant batch workloads (data processing, ML training, CI/CD build agents). Spot VMs are priced at 60–91% discount versus standard VMs but can be preempted by Google with a 25-second shutdown notification when capacity is needed elsewhere. Unlike the deprecated Preemptible VMs (which had a 24-hour maximum runtime), Spot VMs have no maximum lifespan — they run until preempted. In GKE, configure a Spot node pool with spot: true in the node pool spec and use Kubernetes tolerations and nodeSelector to schedule appropriate workloads onto Spot nodes while keeping latency-sensitive services on standard on-demand nodes.
Cloud Run Cost Optimization — Concurrency Tuning: Research the relationship between Cloud Run concurrency and cost. Higher concurrency settings (requests handled simultaneously per instance) reduce instance count for the same throughput — directly reducing compute costs. However, very high concurrency on CPU-bound workloads degrades latency as requests compete for the same vCPU. Research the Cloud Run concurrency guidance: I/O-bound workloads (waiting on database, API calls) benefit from high concurrency (80–1000); CPU-bound workloads (image processing, ML inference) benefit from lower concurrency (1–10) with more instances. Navigate to Cloud Run > [service] > Edit > Container, Networking, Security > Capacity to adjust concurrency and model the cost impact using the pricing calculator.

5.1 Collecting performance information in Google Cloud​

Cloud Run Execution Environment and CPU Allocation​

💡 Additional Performance Collection Objectives & Learning Guidelines​

5.2 Implementing FinOps practices for optimizing resource utilization and costs​

Scale-to-Zero and Container Resource Rightsizing​

💡 Additional FinOps Objectives & Learning Guidelines​