Skip to main content

Istio GKE Module

Overview

The Istio GKE module provisions a complete Google Kubernetes Engine (GKE) Standard cluster and installs the open-source Istio service mesh onto it. Unlike Google Cloud Service Mesh (which is Google's managed, commercially supported Istio distribution), this module works directly with upstream Istio — the same project maintained by the Cloud Native Computing Foundation (CNCF) — giving platform engineers hands-on experience with the technology in its original, unmodified form.

This module is designed as a deep learning environment for platform engineers who want to understand how Istio works from the ground up: how the control plane manages the data plane, how proxies intercept and observe traffic, and how the two fundamentally different data plane architectures — sidecar mode and ambient mode — approach the same problems with different trade-offs.

By deploying this module, you gain direct experience with:

  • Open-source Istio — the CNCF project that underpins both Google Cloud Service Mesh and many other managed mesh offerings, installed directly via istioctl
  • Sidecar mode — the traditional and battle-tested Istio architecture where an Envoy proxy runs as a sidecar container alongside every application pod
  • Ambient mode — Istio's newer, sidecar-free architecture where a shared per-node proxy (ztunnel) handles Layer 4 traffic and optional per-namespace waypoint proxies handle Layer 7
  • GKE Standard — Google's fully configurable Kubernetes offering, distinct from GKE Autopilot, where you manage node pools and cluster-level settings directly
  • Istio traffic management — VirtualService, DestinationRule, Gateway, and the full set of routing and resilience primitives
  • Istio observability — the full open-source stack: Prometheus for metrics, Jaeger for distributed tracing, Grafana for dashboards, and Kiali for service mesh visualisation
  • GKE enterprise features — Workload Identity, VPC-native networking, Security Posture, Managed Prometheus, and Gateway API running on GKE Standard

The module deploys approximately 10–12 minutes to a single GCP project and requires no AWS account — everything runs on Google Cloud.


What Gets Deployed

On Google Cloud:

  • Two GCP APIs enabled: Cloud APIs and Container API
  • A VPC network with a subnet, secondary IP ranges for pods and services, and firewall rules
  • A Cloud Router and Cloud NAT for outbound traffic from cluster nodes
  • A GKE Standard cluster with VPC-native networking, Workload Identity, Security Posture, Managed Prometheus, and Gateway API
  • A node pool of 2 preemptible e2-standard-2 nodes

On the GKE Cluster (one of two choices):

Sidecar Mode (default)Ambient Mode
Data planeEnvoy proxy sidecar in every podShared ztunnel per node, optional waypoint proxies
Installationistioctl install --set profile=defaultistioctl install --set profile=ambient
Namespace labelistio-injection=enabledistio.io/dataplane-mode=ambient
Observability add-onsPrometheus, Jaeger, Grafana, KialiPrometheus, Jaeger, Grafana, Kiali
Layer 7 policiesPer-pod Envoy sidecarOptional waypoint proxy per namespace
┌────────────────────────────────────────────────────────────────────────────┐
│ Istio_GKE Module │
│ │
│ Google Cloud Project │
│ ──────────────────────────────────────────────────────────────────── │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ VPC Network │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ Subnet (10.132.0.0/16) │ │ │
│ │ │ Pod secondary range: 10.62.128.0/17 │ │ │
│ │ │ Service secondary range: 10.64.128.0/20 │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────┐ │ │ │
│ │ │ │ GKE Standard Cluster │ │ │ │
│ │ │ │ • VPC-native networking │ │ │ │
│ │ │ │ • Workload Identity │ │ │ │
│ │ │ │ • Security Posture │ │ │ │
│ │ │ │ • Managed Prometheus │ │ │ │
│ │ │ │ • Gateway API │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Node Pool (2 × e2-standard-2, preemptible) │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Istio Control Plane (istio-system) │ │ │ │
│ │ │ │ • istiod (service discovery + config + CA) │ │ │ │
│ │ │ │ • Ingress Gateway (LoadBalancer) │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ SIDECAR MODE AMBIENT MODE │ │ │ │
│ │ │ │ ┌──────────────┐ ┌──────────────────┐ │ │ │ │
│ │ │ │ │ App Pod │ │ ztunnel (per node│ │ │ │ │
│ │ │ │ │ ┌──────────┐ │ │ L4 mTLS + policy)│ │ │ │ │
│ │ │ │ │ │ App │ │ └────────┬─────────┘ │ │ │ │
│ │ │ │ │ │ Envoy │ │ │ │ │ │ │
│ │ │ │ │ │ sidecar │ │ ┌────────▼─────────┐ │ │ │ │
│ │ │ │ │ └──────────┘ │ │ Waypoint Proxy │ │ │ │ │
│ │ │ │ └──────────────┘ │ (optional, L7) │ │ │ │ │
│ │ │ │ └──────────────────┘ │ │ │ │
│ │ │ │ Observability: Prometheus · Jaeger · Grafana · Kiali │ │ │
│ │ │ └──────────────────────────────────────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ │ Cloud Router + Cloud NAT (outbound egress) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────────┘

Deployment sequence:
1. Enable GCP APIs (cloudapis, container)
2. Create VPC, subnet with secondary ranges, firewall rules
3. Create Cloud Router and Cloud NAT
4. Create GKE Standard cluster and node pool
5. Download and install Istio via istioctl
6. Label default namespace for mesh enrolment
7. Install observability add-ons (Prometheus, Jaeger, Grafana, Kiali)

GCP Networking

The module creates a dedicated VPC network and configures all the networking components that GKE and Istio require to operate. Understanding this layer is important both for appreciating GKE's networking model and for diagnosing connectivity issues in the mesh.

VPC Network

A custom-mode VPC is created with global routing. Using global routing means that Cloud Routers in any region can learn routes from all subnets across all regions — a prerequisite for multi-region GKE deployments and for Cloud NAT to function correctly. A custom-mode VPC (as opposed to auto-mode) gives complete control over subnet CIDR ranges, which is necessary when GKE requires non-overlapping secondary ranges for pod and service IPs.

The VPC uses auto_create_subnetworks = false, meaning only the explicitly configured subnet is created — no automatic subnets appear in other regions that could create unexpected IP overlap with other projects or on-premises networks.

Subnet and Secondary IP Ranges

GKE's VPC-native networking requires a subnet with two secondary IP ranges in addition to the primary range:

RangeDefault CIDRPurpose
Primary subnet range10.132.0.0/16IP addresses for GKE cluster nodes (EC2-equivalent)
Pod secondary range10.62.128.0/17One IP address per pod — 32,766 pod IPs available
Service secondary range10.64.128.0/20One IP per Kubernetes Service ClusterIP — 4,094 service IPs

Why secondary ranges matter for Istio: In sidecar mode, every pod has both an application container and an Envoy sidecar. Both share the pod's IP address — there is no secondary IP for the sidecar. However, the sidecar intercepts traffic using iptables rules set up by the istio-init init container, which requires the NET_ADMIN capability. The GKE cluster is configured with allow_net_admin = true specifically to permit this. In ambient mode this is not required because the ztunnel — which runs as a DaemonSet on each node — handles traffic interception at the node level rather than inside pods.

Private Google Access is enabled on the subnet, allowing nodes without external IPs to reach Google APIs (Cloud Logging, Artifact Registry, Cloud Monitoring) over internal Google network paths rather than the public internet.

Firewall Rules

The module creates six firewall rules that together define the network security boundary for the cluster:

RuleDirectionSourcePortsPurpose
fw-allow-lb-hcINGRESS35.191.0.0/16, 130.211.0.0/22TCP 80Google Cloud Load Balancer health checks — required for the Istio Ingress Gateway's external load balancer to report healthy
fw-allow-nfs-hcINGRESS35.191.0.0/16, 130.211.0.0/22TCP 2049NFS health checks — present for compatibility with storage workloads
fw-allow-iap-sshINGRESS35.235.240.0/20TCP 22SSH to cluster nodes via Identity-Aware Proxy — eliminates the need for a public SSH port or bastion host
fw-allow-intra-vpcINGRESSConfigured VPC CIDRsAllUnrestricted traffic between all resources within the VPC — covers pod-to-pod, node-to-node, and node-to-pod communication including Istio's own control plane traffic
fw-allow-gce-nfs-tcpINGRESSVPC CIDRsTCP 2049NFS service traffic to instances tagged nfs-server
fw-allow-http-tcpINGRESSAll sources (0.0.0.0/0)TCP 80, 443External HTTP and HTTPS traffic to instances tagged http-server — this is what makes the Istio Ingress Gateway reachable from the internet

Why no explicit Istio-specific rules are needed: Istio's control plane traffic (istiod to proxies on port 15012, webhook on port 15017, mTLS between proxies on port 15443) all flows within the VPC. The fw-allow-intra-vpc rule covers all of this without requiring individual rules per Istio component. This is a deliberate simplification for a learning environment — production deployments typically use more granular rules.

Explore firewall rules in the Cloud Console:

Navigate to VPC Network → Firewall in the Cloud Console. Filter by the network name to see all six rules. For each rule, you can view the matched traffic in VPC Network → Firewall → Firewall Rules Logging — useful for understanding which traffic Istio's data plane is actually generating.

# List all firewall rules for the module's VPC
gcloud compute firewall-rules list \
--filter="network:vpc-network" \
--project=GCP_PROJECT_ID

# View a specific rule in detail
gcloud compute firewall-rules describe fw-allow-lb-hc \
--project=GCP_PROJECT_ID

Cloud Router and Cloud NAT

Cluster nodes use private IP addresses (from the 10.132.0.0/16 subnet) and have no external IPs. For nodes to reach the internet — to pull container images from Docker Hub, download Istio from GitHub, or reach any external endpoint — outbound traffic is routed through Cloud NAT.

Cloud Router provides the BGP routing infrastructure that Cloud NAT relies on. It is configured with ASN 64514 (a private ASN in the range reserved for internal use). For this module, the router serves only the NAT function; it does not peer with on-premises networks.

Cloud NAT is configured with AUTO_ONLY IP allocation, meaning Google Cloud automatically assigns external IP addresses from its pool rather than requiring a static IP reservation. This is simpler for learning environments. Logging is set to ERRORS_ONLY to avoid generating log noise from normal NAT operations.

Why this matters for Istio: During installation, istioctl downloads Istio components from github.com and the Istio release bucket. The observability add-ons (Prometheus, Jaeger, Grafana, Kiali) are pulled from their respective container registries. All of this outbound traffic flows through Cloud NAT. Without it, the installation would fail silently as download commands hang waiting for connections that never complete.

# Verify Cloud NAT is healthy and view NAT allocation statistics
gcloud compute routers get-nat-mapping-info cr-region \
--region=GCP_REGION \
--project=GCP_PROJECT_ID

# View NAT gateway configuration
gcloud compute routers nats describe nat-gw-region \
--router=cr-region \
--region=GCP_REGION \
--project=GCP_PROJECT_ID

GKE Standard Cluster

The module provisions a GKE Standard cluster — Google's fully configurable Kubernetes offering where you control node pools, machine types, and cluster-level settings. This is deliberately different from GKE Autopilot, which abstracts all node management away. Using GKE Standard for this module is an intentional learning choice: it exposes the configuration decisions that Autopilot makes automatically, making the trade-offs visible.

GKE Standard vs. GKE Autopilot

Understanding the distinction is a core learning objective of this module:

DimensionGKE Standard (this module)GKE Autopilot
Node managementYou configure machine type, count, disk sizeGoogle manages all nodes invisibly
Pricing modelPer node (pay for provisioned capacity)Per pod (pay for requested resources only)
Istio sidecar injectionRequires allow_net_admin = trueNot required — Autopilot handles this automatically
Cluster-level controlFull access to all Kubernetes settingsSome settings are locked for security
Best forLearning internals, custom configurationsProduction workloads, ops simplicity

Release Channel

The cluster is enrolled in the REGULAR release channel (configurable via release_channel). GKE release channels are Google's mechanism for delivering Kubernetes version updates and GKE feature updates with different risk and velocity profiles:

ChannelUpdate cadenceUse case
RAPIDEarliest access to new Kubernetes versions and GKE featuresTesting and experimentation
REGULAR (default)~2–3 months after RAPID; most usersBalanced — current features without bleeding edge risk
STABLE~2–3 months after REGULARProduction workloads requiring maximum stability

By enrolling in a release channel, the cluster receives automatic Kubernetes patch version upgrades and GKE component upgrades without manual intervention. The minor version is managed by Google within the bounds of the selected channel.

Explore release channel status in the Cloud Console:

Navigate to Kubernetes Engine → Clusters → [your cluster] → Details and look for the Release channel field. You can also see the current Kubernetes version and whether an upgrade is available.

# View the cluster's current version and release channel
gcloud container clusters describe gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="value(currentMasterVersion,releaseChannel.channel)"

# List available Kubernetes versions in the REGULAR channel
gcloud container get-server-config \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="yaml(channels)"

VPC-Native Networking

The cluster uses VPC-native (also called alias IP) networking, enabled by setting the networking mode to VPC_NATIVE. In this mode, every pod receives a real IP address from the pod secondary range (10.62.128.0/17) rather than an overlay address that is NAT'd to a node IP.

This has significant implications for Istio:

  • No double-NAT: Traffic between pods flows directly at the IP layer — the Envoy sidecar sees the real source and destination pod IPs in both sidecar and ambient modes
  • Cloud Load Balancer integration: GKE can create Network Endpoint Groups (NEGs) that point directly to pod IPs, enabling the Istio Ingress Gateway to receive traffic without node-level port forwarding
  • Network policy enforcement: VPC firewall rules and GKE network policies operate on real pod IPs, making security policies easier to reason about
# Verify VPC-native mode and view pod/service CIDR configuration
gcloud container clusters describe gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="yaml(ipAllocationPolicy)"

# View the pod IP range allocated to each node
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,POD-CIDR:.spec.podCIDR'

Workload Identity

Workload Identity is GKE's mechanism for giving Kubernetes service accounts a Google Cloud identity, allowing pods to call GCP APIs without static service account keys.

When Workload Identity is enabled on the cluster, each Kubernetes service account can be bound to a Google Cloud service account (GSA). Pods that use that Kubernetes service account automatically receive a short-lived token for the corresponding GSA, issued by the GKE metadata server running on each node. The pod never sees a JSON key file — the credential is injected transparently.

Why this matters for Istio: The istiod control plane and the Istio Ingress Gateway both need to call GCP APIs (Cloud Logging, Cloud Monitoring, Certificate Manager). With Workload Identity, they authenticate using their Kubernetes service accounts bound to the GKE node service account — no key files distributed inside the cluster.

# Verify Workload Identity is enabled on the cluster
gcloud container clusters describe gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="value(workloadIdentityConfig.workloadPool)"
# Expected output: GCP_PROJECT_ID.svc.id.goog

# View the Kubernetes service accounts used by Istio components
kubectl get serviceaccounts -n istio-system

# Describe istiod's service account to see its annotations
kubectl describe serviceaccount istiod -n istio-system

Explore in the Cloud Console: Navigate to Kubernetes Engine → Clusters → [your cluster] → Security to confirm Workload Identity is enabled and see the workload pool identifier.

Security Posture

The cluster has Security Posture enabled at the BASIC level with VULNERABILITY_BASIC scanning. Security Posture is GKE's built-in security assessment capability that continuously evaluates the cluster against security best practices and known vulnerability databases.

At the BASIC tier, it provides:

  • Workload configuration scanning — evaluates running pods against Kubernetes security best practices (privileged containers, host network usage, root filesystem writes, missing resource limits)
  • Vulnerability scanning — scans container images running in the cluster against the CVE database and reports known vulnerabilities by severity

This is particularly relevant for a learning environment with Istio because the Istio sidecar injector runs as a privileged admission webhook. Security Posture will flag any misconfigured workloads that conflict with security best practices, making it a useful tool for understanding the security implications of mesh configurations.

Explore Security Posture in the Cloud Console:

Navigate to Kubernetes Engine → Security → Security Posture. The dashboard shows concerns grouped by severity across all clusters in the project. Select your cluster to see workload-specific findings — for example, whether any pods are running without resource limits or with overly permissive capabilities.

# View Security Posture findings via gcloud
gcloud container clusters describe gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="value(securityPostureConfig)"

Cloud Logging and Managed Prometheus

The cluster is configured to send both system component logs and workload logs to Cloud Logging, and has Managed Prometheus enabled — the same observability integration described in the EKS GKE module, now running on a native GKE cluster.

This creates an interesting learning comparison: the Istio observability add-ons (Prometheus, Jaeger, Grafana, Kiali) are open-source tools running inside the cluster, while GKE's Managed Prometheus collects metrics from the cluster and forwards them to Google Cloud Monitoring. Both exist simultaneously — you can query the same Kubernetes metrics either through the in-cluster Prometheus instance or through Cloud Monitoring's PromQL interface.

# Confirm Managed Prometheus is collecting metrics
kubectl get pods -n gmp-system

# Query a Kubernetes metric via Cloud Monitoring PromQL
# (run in Cloud Console: Monitoring → Metrics Explorer → PromQL)
# kubernetes_io:container_memory_used_bytes{cluster="gke-cluster"}

Gateway API

The cluster has the Gateway API standard channel enabled. Gateway API is the CNCF successor to the Kubernetes Ingress resource, providing a more expressive and extensible API for routing traffic into and within the cluster.

Istio 1.24 has first-class Gateway API support — you can use HTTPRoute, TCPRoute, and Gateway resources from the CNCF Gateway API spec as an alternative to Istio's own VirtualService and DestinationRule resources. Having Gateway API enabled on the GKE cluster means both approaches work side by side, giving platform engineers the opportunity to compare the two traffic management APIs directly.

# Verify Gateway API CRDs are installed
kubectl get crd | grep gateway

# List any Gateway API resources in the cluster
kubectl get gateways,httproutes -A

Node Pool

The cluster has a single node pool of 2 preemptible e2-standard-2 nodes:

ParameterValueRationale
Machine typee2-standard-22 vCPU, 8 GB RAM — enough for istiod, Envoy sidecars, and the observability stack
Disk typepd-ssdSSD for faster pod scheduling and image pull
Disk size50 GBSufficient for Istio images and the observability add-on images
PreemptibleYesUp to 80% cost reduction — acceptable for a learning environment, not for production
Node count2Minimum for high availability of the Istio control plane

Understanding preemptible nodes: GCE can reclaim preemptible VMs with 30 seconds notice when it needs the capacity. GKE handles this gracefully by draining the node and rescheduling pods on the remaining node. However, if both nodes are reclaimed simultaneously, the cluster becomes temporarily unavailable. For production Istio deployments, regular (non-preemptible) nodes across multiple zones are recommended.

# View node pool details including machine type and preemptibility
gcloud container node-pools describe default-pool \
--cluster=gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID

# Check node status and resource capacity
kubectl get nodes -o wide
kubectl describe nodes | grep -A 10 "Allocatable:"

GKE Node Service Account

A dedicated Google Cloud service account is created for GKE nodes with the minimum permissions required for cluster operation:

RolePurpose
storage.objectAdmin / storage.objectViewerRead and write access to Cloud Storage (for GCS Fuse CSI driver)
artifactregistry.readerPull container images from Artifact Registry
monitoring.metricWriter / monitoring.viewerWrite metrics to Cloud Monitoring (for Managed Prometheus)
logging.logWriterWrite logs to Cloud Logging
compute.networkViewerRead VPC network configuration (required by GKE networking components)
stackdriver.resourceMetadata.writerWrite resource metadata for Stackdriver integrations
container.defaultNodeServiceAccountBase GKE node permissions

Using a dedicated service account with only the required roles — rather than the Compute Engine default service account with broad Editor permissions — follows the principle of least privilege and is a GKE security best practice.

# View the node service account and its IAM roles
gcloud iam service-accounts list \
--filter="displayName:gke" \
--project=GCP_PROJECT_ID

# Verify the node pool uses the dedicated service account
gcloud container node-pools describe default-pool \
--cluster=gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID \
--format="value(config.serviceAccount)"

Istio Core Concepts

Before exploring the two data plane modes, this section explains what Istio is, how it is structured, and how the module installs it — providing the conceptual foundation for everything that follows.

What Istio Is

Istio is an open-source service mesh — an infrastructure layer that intercepts all network communication between services in a Kubernetes cluster and provides traffic management, security, and observability capabilities without requiring any changes to application code.

The key insight behind Istio is that all the networking concerns that would otherwise be coded into each service — retries, timeouts, circuit breaking, TLS, access control, telemetry — can instead be handled by a proxy layer that wraps every service transparently. Applications communicate as if they are talking directly to each other. The proxy layer handles everything in between.

Istio is the upstream open-source project from which Google Cloud Service Mesh, AWS App Mesh, and several other vendor offerings are derived. Learning Istio directly — as this module teaches — gives platform engineers knowledge that transfers across every managed mesh product built on it.

The Two Planes

Istio has a strict separation between two planes:

The Control Plane: istiod

istiod is a single binary that combines three functions that were separate components in older Istio versions:

FunctionWhat it does
PilotConverts Istio configuration (VirtualService, DestinationRule) into Envoy xDS configuration and pushes it to every proxy in the mesh
CitadelActs as a certificate authority (CA), issuing and rotating X.509 certificates for every proxy — the foundation of mTLS
GalleyValidates and distributes Istio configuration, ensuring proxies receive consistent and correct configuration

istiod runs in the istio-system namespace and is the single point of truth for the mesh. When you apply a VirtualService, istiod translates it into Envoy route configuration and pushes the update to every relevant proxy within milliseconds using the xDS (discovery service) protocol — a gRPC streaming API that Envoy polls continuously.

The Data Plane: Envoy Proxies

The data plane is composed of Envoy proxy instances that intercept and process all network traffic. Envoy is a high-performance, C++-based proxy originally built at Lyft and now a CNCF project in its own right. It is the same proxy used by virtually every service mesh and many API gateways.

Each Envoy instance maintains a live connection to istiod and receives continuous configuration updates. It handles:

  • HTTP/1.1, HTTP/2, gRPC, TCP, and WebSocket traffic
  • Load balancing across multiple endpoints
  • Retries, timeouts, and circuit breaking
  • mTLS termination and certificate presentation
  • Metrics collection (request count, latency, error rate per route)
  • Distributed trace span generation

The control plane and data plane communicate exclusively through the xDS protocol — istiod never directly proxies traffic, and Envoy proxies never read Kubernetes API objects directly. This clean separation means the data plane continues to operate even if istiod is temporarily unavailable; proxies keep their last-known configuration.

# After deploying the cluster and installing Istio, verify istiod is running
kubectl get pods -n istio-system -l app=istiod

# View the istiod logs to see xDS configuration being pushed to proxies
kubectl logs -n istio-system \
-l app=istiod \
--tail=50

# Check the Istio version installed
kubectl exec -n istio-system \
$(kubectl get pod -n istio-system -l app=istiod -o jsonpath='{.items[0].metadata.name}') \
-- pilot-discovery version

How Istio is Installed: istioctl

Unlike Google Cloud Service Mesh (which is activated through the GKE Fleet API with a single gcloud command), open-source Istio is installed using istioctl — the Istio command-line tool. This is a meaningful difference: istioctl gives you direct control over every aspect of the installation, making the configuration transparent and inspectable.

The module downloads the Istio release directly from GitHub at the version specified by istio_version (default 1.24.2) and installs it using an IstioOperator configuration. The IstioOperator is a Kubernetes custom resource that describes the desired Istio installation — which components to install, how to configure them, and what resource requests to set.

The installation sets three mesh-wide identifiers that matter for multi-cluster and multi-network topologies:

IdentifierValuePurpose
Mesh IDmesh1Identifies the logical mesh — clusters with the same mesh ID can share trust
Cluster namecluster1Identifies this cluster within the mesh for routing and telemetry
Network namenetwork1Identifies the network — used for east-west gateway configuration in multi-network setups

Even for a single-cluster deployment, these identifiers are set to the standard multi-cluster values so the installation is ready to extend to multi-cluster without reinstallation.

# Connect to the cluster first
gcloud container clusters get-credentials gke-cluster \
--region=GCP_REGION \
--project=GCP_PROJECT_ID

# View all Istio components installed in the cluster
kubectl get all -n istio-system

# Inspect the IstioOperator configuration that was applied
kubectl get istiooperator -n istio-system -o yaml

# Verify the Istio installation is healthy
istioctl verify-install

# Check the overall mesh status
istioctl proxy-status

Explore in the Cloud Console: Navigate to Kubernetes Engine → Clusters → [your cluster] → Workloads and filter by namespace istio-system. You will see istiod, the istio-ingressgateway, and (in sidecar mode) any admission webhooks. The Workloads view shows resource consumption and restart counts for each component.

The Istio Ingress Gateway

Both sidecar and ambient mode installations include an Istio Ingress Gateway — a standalone Envoy proxy deployed as a Kubernetes Deployment with a Service of type LoadBalancer. This creates a GCP External Load Balancer that receives external HTTP/HTTPS traffic and forwards it into the mesh.

The Ingress Gateway is the mesh's entry point. Unlike the sidecar proxies (which are co-located with application pods), the Ingress Gateway runs as a separate workload that you configure independently using Istio Gateway and VirtualService resources.

The gateway is configured with:

  • Minimum 1 replica, maximum 5 replicas
  • CPU-based horizontal autoscaling at 80% utilisation
  • A public external IP assigned by GCP
# Get the external IP of the Istio Ingress Gateway
kubectl get svc istio-ingressgateway -n istio-system

# Watch the external IP being assigned (takes 1-2 minutes after install)
kubectl get svc istio-ingressgateway -n istio-system --watch

# View the ingress gateway pods and their resource usage
kubectl get pods -n istio-system -l app=istio-ingressgateway
kubectl top pods -n istio-system -l app=istio-ingressgateway

Sidecar Mode

Sidecar mode is the traditional and most widely deployed Istio architecture. It is the default when install_ambient_mesh = false. This section explains exactly how it works, what gets created in the cluster, and what you can observe once it is running.

How Sidecar Injection Works

When sidecar mode is installed, istiod registers a Mutating Webhook Admission Controller with the Kubernetes API server. Every time a new pod is created in a namespace labelled istio-injection=enabled, the Kubernetes API server calls this webhook before the pod is scheduled. istiod responds by modifying the pod specification to add:

  1. An init container (istio-init) — runs before the application container and sets up iptables rules that redirect all inbound and outbound TCP traffic to the Envoy proxy ports. This is why the cluster requires allow_net_admin = true — the init container needs the NET_ADMIN Linux capability to modify the kernel's network table.

  2. A sidecar container (istio-proxy) — the Envoy proxy. It listens on two ports: 15001 for all outbound traffic and 15006 for all inbound traffic. Because iptables redirects all TCP traffic to these ports, neither the application nor any external client knows the proxy exists — the application binds to its normal port and the proxy intercepts transparently.

The module labels the default namespace with istio-injection=enabled automatically. Any pod deployed to the default namespace receives the sidecar without any per-deployment configuration.

# Verify the sidecar injection webhook is registered
kubectl get mutatingwebhookconfiguration | grep istio

# Confirm the default namespace is labelled for injection
kubectl get namespace default --show-labels

# Deploy a test pod and confirm it received a sidecar
kubectl run test-pod --image=nginx --restart=Never
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].name}'
# Expected: nginx istio-proxy

# View the iptables rules set up by istio-init inside a pod
kubectl debug -it test-pod --image=busybox -- iptables -t nat -L

What the Sidecar Intercepts

Once a pod has a sidecar, every byte of network traffic it sends or receives passes through Envoy. The application never communicates directly with the network. This gives Istio complete visibility and control:

  • Outbound traffic: When the application opens a connection to another service, iptables redirects the SYN packet to Envoy's outbound listener (port 15001). Envoy looks up the destination in its routing table (populated by istiod from VirtualService and DestinationRule configurations), applies any traffic management rules, performs mTLS handshake with the destination sidecar, and forwards the request.

  • Inbound traffic: When a request arrives at the pod, iptables redirects it to Envoy's inbound listener (port 15006). Envoy validates the mTLS certificate of the caller, checks any AuthorizationPolicy rules, records the request in its metrics and traces, then forwards it to the application's actual port.

The application sees only plain HTTP or TCP — it never handles TLS or deals with the proxy. This is the "zero code change" promise of a service mesh.

Resource Overhead of Sidecars

Each sidecar consumes resources from the pod's node:

ResourceDefault requestNotes
CPU100m (0.1 vCPU)Scales with traffic volume — under heavy load, Envoy uses significantly more
Memory128 MiGrows with the number of routes in the mesh

In a cluster with many pods, sidecar overhead accumulates quickly. A cluster with 100 pods adds approximately 10 vCPUs and 12 GB of memory just for proxy overhead. This is one of the primary motivations for ambient mode. On the e2-standard-2 nodes in this module (2 vCPU, 8 GB each), the sidecar overhead is measurable and visible in the node resource consumption.

# See the resource requests set on the istio-proxy sidecar
kubectl get pod test-pod \
-o jsonpath='{.spec.containers[?(@.name=="istio-proxy")].resources}'

# View actual memory and CPU used by all proxy sidecars across the cluster
kubectl top pods -A --containers | grep istio-proxy | sort -k4 -rh

Mutual TLS in Sidecar Mode

With both source and destination pods having Envoy sidecars, Istio can enforce mutual TLS (mTLS) automatically — both sides present certificates and verify each other's identity before any application data flows.

Certificates are issued by istiod's built-in CA (Citadel) and are rotated automatically every 24 hours. Each certificate contains the pod's SPIFFE identity: spiffe://cluster.local/ns/<namespace>/sa/<service-account>, which encodes the Kubernetes namespace and service account. This means mTLS authentication is tied to Kubernetes identity — not IP addresses or DNS names.

By default, Istio operates in permissive mode: it accepts both mTLS and plain text connections. This allows gradual mesh adoption. Once all services in a namespace have sidecars, you can switch to strict mode with a PeerAuthentication policy:

# Check the current mTLS mode for the default namespace
kubectl get peerauthentication -n default

# Enable strict mTLS for the default namespace
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT
EOF

# Verify mTLS is being used between two pods
# (look for X-Forwarded-Client-Cert header — present only with mTLS)
kubectl exec test-pod -c istio-proxy -- \
curl -s http://kubernetes.default.svc/api --header "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" | head -5

# View the certificate istiod issued to a sidecar
istioctl proxy-config secret test-pod

Explore mTLS in the Cloud Console: If Kiali is port-forwarded (see the Observability section), navigate to Graph and enable the Security display option. Padlock icons on service-to-service edges confirm mTLS is active.

Exploring the Sidecar Configuration

istioctl provides rich commands for inspecting what configuration each sidecar proxy has received from istiod:

# View the complete Envoy configuration for a pod's sidecar
istioctl proxy-config all test-pod

# View routing rules (equivalent to VirtualService translations)
istioctl proxy-config routes test-pod

# View cluster definitions (upstream services and load balancing config)
istioctl proxy-config clusters test-pod

# View listeners (ports Envoy is listening on)
istioctl proxy-config listeners test-pod

# View endpoints (the actual pod IPs Envoy knows about for each service)
istioctl proxy-config endpoints test-pod

# Analyse the mesh configuration for potential issues
istioctl analyze -n default

Ambient Mode

Ambient mode is Istio's next-generation data plane architecture, introduced as stable in Istio 1.22. It removes the need for per-pod sidecar proxies entirely, reducing resource overhead and operational complexity while preserving the full Istio feature set for workloads that need it.

Why Ambient Mode Exists

Sidecar proxies are powerful but come with trade-offs. Each pod carries an additional container that consumes CPU and memory, adding up significantly in large clusters. The sidecar lifecycle is tied to the pod lifecycle — updating Istio requires rolling restarts of all workloads. istio-init requires NET_ADMIN privileges, which some security policies prohibit.

Ambient mode resolves these constraints by moving the data plane out of the pods and into dedicated infrastructure components that run at the node and namespace level.

The Two-Layer Architecture

Ambient mode separates data plane responsibilities into two distinct layers:

LayerComponentScopeProtocols
L4 (secure overlay)ztunnelPer node (DaemonSet)TCP, mTLS tunnelling (HBONE)
L7 (advanced policy)Waypoint proxyPer namespace or service accountHTTP, gRPC, WebSocket

This separation lets you opt in to L7 features only where needed, keeping baseline infrastructure costs low.

ztunnel — The Zero-Trust Tunnel

ztunnel (zero-trust tunnel) is a lightweight Rust-based proxy deployed as a DaemonSet — one pod on every node in the cluster. It handles all L4 traffic for ambient-mode workloads on that node.

What ztunnel provides:

  • Transparent mTLS — encrypts all pod-to-pod traffic without modifying the pod itself
  • SPIFFE identity — issues and validates workload certificates exactly as the sidecar does; the same spiffe://cluster.local/ns/<ns>/sa/<sa> format applies
  • L4 authorisation — enforces AuthorizationPolicy resources based on source workload identity and destination port
  • No NET_ADMIN required — traffic redirection uses iptables rules applied at the node level, outside the pod

Inspect ztunnel on any node:

# View the ztunnel DaemonSet
kubectl get daemonset ztunnel -n istio-system

# View ztunnel pods (one per node)
kubectl get pods -n istio-system -l app=ztunnel -o wide

# Stream ztunnel logs for a specific node
kubectl logs -n istio-system -l app=ztunnel --follow

# Inspect ztunnel's workload graph (what it knows about the mesh)
istioctl ztunnel-config workloads

Explore in the Cloud Console: Navigate to Kubernetes Engine → Workloads, filter by namespace istio-system, and select the ztunnel DaemonSet. The Managed pods tab shows one pod per node, confirming node-level coverage.

Waypoint Proxies — On-Demand L7

When you need HTTP-level features — traffic splitting, retries, header manipulation, JWT authentication, or fine-grained AuthorizationPolicy — you deploy a waypoint proxy for a namespace or specific service account.

A waypoint is a standard Envoy proxy deployed as a Kubernetes Deployment, provisioned via the istioctl waypoint command. All traffic entering the target namespace (or service account) is redirected through the waypoint before reaching the destination pod.

# Deploy a waypoint for the default namespace
istioctl waypoint apply --namespace default

# Verify the waypoint is running
kubectl get gateway -n default

# View the waypoint proxy deployment
kubectl get deployment -n default -l gateway.istio.io/managed=istio.io-mesh-controller

# Inspect the waypoint's configuration
istioctl proxy-config routes -n default deployment/waypoint

# Remove a waypoint
istioctl waypoint delete --namespace default

Waypoints use the Kubernetes Gateway API (gateway.networking.k8s.io/v1) rather than Istio-specific resources. This makes waypoint configuration portable and consistent with the broader Kubernetes ecosystem.

Enrolling Workloads in Ambient Mode

Ambient mode is opt-in at the namespace level. Label a namespace to enroll all workloads in it:

# Enroll the default namespace in ambient mode
kubectl label namespace default istio.io/dataplane-mode=ambient

# Verify the label
kubectl get namespace default --show-labels

# Check that ztunnel is tracking workloads in the namespace
istioctl ztunnel-config workloads | grep default

Individual pods can be excluded from ambient mode:

# Exclude a specific pod from ambient mode
kubectl label pod <pod-name> istio.io/dataplane-mode=none

Unlike sidecar mode, enrolling in ambient mode requires no pod restart. ztunnel begins intercepting traffic for labeled pods immediately.

Comparing Sidecar and Ambient Mode

DimensionSidecar ModeAmbient Mode
Data plane locationInside each podNode-level (ztunnel) + namespace-level (waypoint)
Pod restart required to enrollYes (sidecar injection)No
Pod restart required to upgrade IstioYes (rolling restart)No (DaemonSet update only)
NET_ADMIN requiredYes (istio-init)No
L4 policyPer-pod sidecarztunnel (per-node)
L7 policyPer-pod sidecarWaypoint proxy (opt-in per namespace)
Memory overhead per workload~128 MiB (sidecar)Near zero (shared ztunnel)
CPU overhead per workload~100m (sidecar)Near zero (shared ztunnel)
LatencyAdds ~0.5ms (two hops)Lower for L4-only; similar when waypoint is used
SPIFFE identityYesYes
Kubernetes Gateway APISupportedNative (waypoints use Gateway API)
MaturityStable since Istio 1.0Stable since Istio 1.22

Ambient Mode Resource Overhead

Because ztunnel is shared across all pods on a node, the per-workload overhead is negligible. The only dedicated resources are:

ComponentCountTypical resources
ztunnel podOne per node100m CPU, 128Mi memory
Waypoint proxyOne per namespace (optional)100m CPU, 128Mi memory
istiodOne deployment200m CPU, 256Mi memory

In a 10-node cluster running 200 pods, ambient mode consumes roughly 10× less memory than the equivalent sidecar deployment.

Verifying Ambient Mode End-to-End

After enrolling a namespace, verify that mTLS encryption is active between two pods:

# Deploy two test pods
kubectl run server --image=nginx --port=80
kubectl run client --image=curlimages/curl --restart=Never -- sleep 3600

# From the client pod, curl the server — traffic is transparently encrypted by ztunnel
kubectl exec client -- curl http://server

# Check that ztunnel logged the connection
kubectl logs -n istio-system -l app=ztunnel | grep "server"

# Confirm ztunnel is tracking the connection
istioctl ztunnel-config connections

Explore in the Cloud Console: Navigate to Kubernetes Engine → Service & Ingress and select a Service in the ambient-enrolled namespace. The Traffic tab shows inter-pod flows. For deeper inspection, use Network Intelligence → Network Topology to visualise east-west traffic patterns between enrolled namespaces.


Traffic Management

Istio's traffic management capabilities give platform engineers fine-grained control over how requests flow between services — without modifying application code. All traffic management rules are expressed as Kubernetes custom resources and take effect through configuration pushed to Envoy proxies (or ztunnel/waypoints in ambient mode) via the xDS protocol.

VirtualService

A VirtualService defines how requests addressed to a Kubernetes Service are routed. It intercepts traffic at the client sidecar (sidecar mode) or waypoint proxy (ambient mode) before the request leaves the source pod.

Key capabilities:

  • Weighted routing — split traffic across multiple versions of a service by percentage
  • Header-based routing — route requests to different backends based on HTTP headers, URI prefixes, or method
  • Retries — automatically retry failed requests before returning an error to the caller
  • Timeouts — enforce a maximum duration for requests to prevent cascading failures
  • Fault injection — deliberately introduce delays or HTTP errors for chaos testing

Canary deployment example — route 10 % of traffic to v2:

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: my-service
namespace: default
spec:
hosts:
- my-service
http:
- route:
- destination:
host: my-service
subset: v1
weight: 90
- destination:
host: my-service
subset: v2
weight: 10

Retry and timeout example:

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: my-service
namespace: default
spec:
hosts:
- my-service
http:
- timeout: 5s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: gateway-error,connect-failure,retriable-4xx
route:
- destination:
host: my-service
subset: v1

Apply and verify:

# Apply the VirtualService
kubectl apply -f virtualservice.yaml

# Inspect what rules are active
kubectl get virtualservice -n default

# View the full spec
kubectl describe virtualservice my-service -n default

# Confirm the routing config reached the sidecar proxy
istioctl proxy-config routes <client-pod> | grep my-service

DestinationRule

A DestinationRule defines the properties of traffic after routing has occurred — how Istio connects to the destination, and what subsets of the destination exist. VirtualService references subsets defined here.

Key capabilities:

  • Subsets — group pod instances by label (e.g., version: v1, version: v2) to represent different versions of a service
  • Load balancing policy — round-robin, least connections, random, or consistent hash (session affinity)
  • Connection pool settings — limit the number of TCP connections or pending HTTP requests to prevent overload
  • Outlier detection — automatically eject unhealthy pods from the load-balancing pool (circuit breaking)
  • TLS settings — control how the proxy connects to upstream services (auto, ISTIO_MUTUAL, SIMPLE)

Subset and circuit breaker example:

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: my-service
namespace: default
spec:
host: my-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 10
outlierDetection:
consecutiveGatewayErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
trafficPolicy:
loadBalancer:
simple: LEAST_CONN

Apply and verify:

# Apply the DestinationRule
kubectl apply -f destinationrule.yaml

# List all DestinationRules
kubectl get destinationrule -n default

# Check that outlier detection is tracking ejections
istioctl proxy-config clusters <client-pod> | grep my-service

Gateway and Ingress Traffic

An Istio Gateway resource configures the Ingress Gateway deployment to accept inbound traffic from outside the mesh. It specifies which ports, protocols, and TLS settings to expose. A VirtualService is then bound to the Gateway to route inbound traffic to internal services.

TLS-terminated ingress example:

apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: my-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: my-tls-secret
hosts:
- "app.example.com"
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: my-app
namespace: default
spec:
hosts:
- "app.example.com"
gateways:
- istio-system/my-gateway
http:
- route:
- destination:
host: my-service
port:
number: 80

Explore the ingress gateway:

# Get the external IP of the Ingress Gateway
kubectl get service istio-ingressgateway -n istio-system

# View all Gateways in the cluster
kubectl get gateway -A

# Check what routes the ingress gateway has
istioctl proxy-config routes deployment/istio-ingressgateway -n istio-system

# View access logs (if enabled)
kubectl logs -n istio-system -l app=istio-ingressgateway --follow

Explore in the Cloud Console: Navigate to Kubernetes Engine → Service & Ingress and select the istio-ingressgateway LoadBalancer Service. The Details tab shows the external IP, port mappings, and associated backend pods. The Observability tab (if Cloud Monitoring is active) shows request rates and error percentages for ingress traffic.

AuthorizationPolicy

AuthorizationPolicy controls which workloads are allowed to communicate with each other within the mesh. Policies are enforced by the sidecar proxy (sidecar mode) or by ztunnel and waypoints (ambient mode) and are evaluated after mTLS identity verification.

Deny all, then allow selectively (recommended baseline):

# Deny all traffic by default in the production namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: production
spec: {}
---
# Allow the frontend service to call the backend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
selector:
matchLabels:
app: backend
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/frontend"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/*"]

Apply and verify:

# Apply the policies
kubectl apply -f authpolicy.yaml

# List all AuthorizationPolicies
kubectl get authorizationpolicy -A

# Test connectivity from the frontend pod
kubectl exec -n production deploy/frontend -- curl http://backend/api/health

# Check that a denied request is rejected
kubectl exec -n production deploy/other-service -- curl http://backend/api/health
# Expected: RBAC: access denied

Kubernetes Gateway API (Istio 1.24+)

Istio 1.24 promotes the Kubernetes Gateway API (gateway.networking.k8s.io) to the primary configuration surface, deprecating Ingress in favour of HTTPRoute, TCPRoute, and GRPCRoute. The Gateway API is already installed on the GKE cluster by this module.

The Gateway API introduces a clean role separation:

  • Infrastructure admin — creates GatewayClass and Gateway (cluster-scoped)
  • Application developer — creates HTTPRoute pointing to their Service (namespace-scoped)
# View available GatewayClasses
kubectl get gatewayclass

# View Gateways
kubectl get gateway -A

# View HTTPRoutes
kubectl get httproute -A

# Describe an HTTPRoute to see traffic rules
kubectl describe httproute my-route -n default

Observability

Istio provides first-class integrations with the open-source observability stack. The add-on components installed by this module — Prometheus, Grafana, Kiali, and Jaeger — work together to give platform engineers a complete picture of service health, traffic patterns, and distributed request traces.

Prometheus — Metrics Collection

Istio's sidecar proxies (and ztunnel in ambient mode) automatically expose a Prometheus-compatible metrics endpoint. No application instrumentation is required. Prometheus scrapes the /stats/prometheus endpoint on every sidecar and aggregates the data.

Key Istio-generated metrics:

  • istio_requests_total — request count labelled by source service, destination service, response code, and protocol
  • istio_request_duration_milliseconds — latency histogram for every service-to-service call
  • istio_tcp_sent_bytes_total / istio_tcp_received_bytes_total — byte counters for TCP flows
  • pilot_xds_pushes — number of configuration updates pushed from istiod to proxies
  • envoy_cluster_upstream_cx_active — active connections per upstream cluster

Access Prometheus:

# Port-forward Prometheus to your local machine
kubectl port-forward -n istio-system svc/prometheus 9090:9090

# Open http://localhost:9090 in a browser

# Query total requests to a service in the last 5 minutes
# In the Prometheus expression bar, enter:
# rate(istio_requests_total{destination_service_name="my-service"}[5m])

# Query P99 latency for a service
# histogram_quantile(0.99, rate(istio_request_duration_milliseconds_bucket{destination_service_name="my-service"}[5m]))

Explore in the Cloud Console: Navigate to Monitoring → Metrics Explorer. In the metric picker, search for istio to see all Istio metrics forwarded by Google Cloud Managed Service for Prometheus (if GMP integration is enabled). Set the aggregation to sum by (destination_service_name) to see per-service request rates.

Grafana — Dashboards

Grafana is pre-configured with Istio's official dashboard set. These dashboards provide immediate operational visibility without requiring manual dashboard creation.

Key dashboards:

  • Istio Mesh Dashboard — cluster-wide overview: global request rate, error rate, P50/P90/P99 latency
  • Istio Service Dashboard — per-service inbound and outbound traffic details
  • Istio Workload Dashboard — per-Deployment or per-Pod traffic breakdown
  • Istio Control Plane Dashboard — istiod health: xDS push rate, connected proxies, configuration errors
  • Istio Performance Dashboard — sidecar resource usage (CPU, memory) per workload

Access Grafana:

# Port-forward Grafana
kubectl port-forward -n istio-system svc/grafana 3000:3000

# Open http://localhost:3000 in a browser
# Default credentials: admin / admin (change on first login)

# Navigate to Dashboards → Browse → Istio to find all pre-built dashboards

Explore in the Cloud Console: Navigate to Monitoring → Dashboards and look for any Istio dashboards that may have been imported if GMP is active. Alternatively, create a custom dashboard using the istio_requests_total metric to build a service-level request rate chart.

Kiali — Service Mesh Topology

Kiali is the observability console purpose-built for Istio. It reads from Prometheus, queries the Kubernetes API, and renders an interactive graph of every service-to-service relationship in the mesh.

Kiali capabilities:

  • Traffic graph — animated real-time visualisation of request flows with request rates, error rates, and mTLS status on each edge
  • Health indicators — red/yellow/green status for each node based on success rate thresholds you configure
  • Configuration validation — detects misconfigured VirtualService, DestinationRule, and Gateway resources and highlights conflicts
  • Workload detail — per-pod inbound/outbound traffic, health, and associated Istio configuration
  • Tracing integration — links from Kiali nodes directly to Jaeger traces for the selected service

Access Kiali:

# Port-forward Kiali
kubectl port-forward -n istio-system svc/kiali 20001:20001

# Open http://localhost:20001 in a browser

# In the Graph view, select your namespace(s) and click Display to show:
# - Security (padlock icons confirming mTLS on each edge)
# - Traffic animation (moving dots proportional to request rate)
# - Response time (P50/P99 labels on edges)

Security audit use case: In the Kiali graph, enable the Security display layer. Any edge without a padlock icon indicates that traffic between those two services is not mTLS-encrypted. This can reveal misconfigured PeerAuthentication policies or workloads not yet enrolled in the mesh.

Jaeger — Distributed Tracing

Jaeger implements the distributed tracing standard for the mesh. When a request enters the mesh through the Ingress Gateway, Istio automatically generates a trace span. Each hop through a sidecar or waypoint adds a child span, giving you a complete timeline of how a request travelled through your services.

Important: Istio propagates trace context (W3C TraceContext or B3 headers) between proxies, but for in-application spans to appear, your application must forward the trace headers it receives to any outbound calls it makes. The required headers are: x-request-id, x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, x-b3-flags, and traceparent.

Access Jaeger:

# Port-forward Jaeger
kubectl port-forward -n istio-system svc/tracing 16686:80

# Open http://localhost:16686 in a browser

# In the Search tab:
# - Service: select your service name
# - Operation: leave blank to see all
# - Limit Results: 20
# Click Find Traces to see recent requests

# Inspect a trace to see every hop and latency at each stage

Trace sampling: By default, Istio samples 1 % of traces to limit storage costs. To increase the sampling rate for debugging:

# Check the current sampling rate
kubectl get configmap istio -n istio-system -o jsonpath='{.data.mesh}' | grep sampling

# Patch the mesh config to sample 100% (use only for debugging, not production)
kubectl patch configmap istio -n istio-system --type=merge \
-p '{"data":{"mesh":"defaultConfig:\n tracing:\n sampling: 100.0"}}'

Configuration Reference

The following table covers the key configuration parameters that affect the behaviour of the Istio GKE module. These are the settings available when using the module.

GKE Cluster Settings

ParameterDefaultDescription
GKE release channelREGULARDetermines the Kubernetes version stream. RAPID gets features sooner; STABLE is the most conservative.
Node machine typee2-standard-22 vCPU, 8 GiB RAM per node. Suitable for demonstration workloads. Increase for production.
Node count2Fixed number of nodes in the node pool. No cluster autoscaler is configured — the pool maintains exactly this count.
Preemptible nodestrueReduces cost significantly. Not recommended for production; preemptible VMs can be reclaimed with 30 seconds notice.
Workload IdentityEnabledBinds Kubernetes service accounts to Google service accounts. Required for GKE workloads to access Google Cloud APIs.
VPC-native networkingEnabledPods receive IPs from a secondary subnet range. Required for Istio sidecar mode and enables direct VPC routing to pods.
Security Posture scanningEnabledContinuously scans workload configurations and container images for vulnerabilities.
Gateway APIEnabledInstalls the Kubernetes Gateway API CRDs. Required for Istio 1.24+ ambient mode waypoints.

Istio Installation Settings

ParameterDefaultDescription
Data plane modeConfigurablesidecar for the traditional per-pod proxy model; ambient for the node-level ztunnel architecture.
Ingress GatewayEnabledDeploys an istio-ingressgateway LoadBalancer Service as the cluster's external entry point.
Ingress Gateway replicas1Starting replica count. The HPA scales between 1 and 5 based on CPU utilisation.
mTLS modePERMISSIVEAccepts both plaintext and mTLS traffic. Change to STRICT to reject all unencrypted traffic.
Certificate authorityistiod built-inistiod acts as the mesh CA, issuing 24-hour SPIFFE certificates to all workloads.
Trace sampling rate1 %The proportion of requests for which Jaeger traces are generated.

Observability Add-on Settings

ParameterDefaultDescription
PrometheusEnabledScrapes metrics from all sidecar proxies and the istiod control plane. Data retained for 15 days by default.
GrafanaEnabledPre-loaded with the official Istio dashboard set. Connected to the local Prometheus instance.
KialiEnabledService mesh topology console. Reads from Prometheus and the Kubernetes API.
JaegerEnabledDistributed tracing backend. Receives traces from the mesh via the Zipkin-compatible collector endpoint.

Networking Settings

ParameterDefaultDescription
VPCCreated by moduleA dedicated VPC with global routing mode and a single subnet.
Subnet regionConfigurableThe GCP region where the subnet and GKE cluster reside.
Pod IP range/16 secondary rangeAllocates up to 65,536 pod IPs from the subnet's secondary range.
Service IP range/20 secondary rangeAllocates up to 4,096 ClusterIP addresses for Kubernetes Services.
Cloud NATEnabledProvides outbound internet access for nodes (required for istioctl and add-on image pulls).

Default Behaviours

Understanding the module's default configuration helps avoid surprises when exploring or extending what is deployed.

mTLS is permissive, not strict. The mesh-wide PeerAuthentication policy defaults to PERMISSIVE mode. Services not yet enrolled in the mesh can still communicate with mesh-enrolled services. This is intentional for incremental adoption, but you should switch individual namespaces to STRICT as workloads are onboarded.

Ingress Gateway has no TLS by default. The istio-ingressgateway Service exposes port 80 (HTTP) and port 443 (HTTPS), but no TLS certificate is pre-configured. You must create a Kubernetes TLS Secret and reference it in a Gateway resource to serve HTTPS.

No AuthorizationPolicy is applied. All service-to-service communication within the mesh is allowed by default. Apply a deny-all baseline policy per namespace and explicitly allow only the required paths, as shown in the Traffic Management section.

Nodes are preemptible. The node pool uses preemptible VMs, which cost significantly less but can be terminated with 30 seconds notice. Workloads must tolerate this via PodDisruptionBudget and multiple replicas.

Observability add-ons use in-cluster storage. Prometheus, Grafana, Jaeger, and Kiali store data within the cluster on PersistentVolumeClaims. Data is lost if the cluster is deleted. For production, configure remote storage (e.g., Google Cloud Managed Service for Prometheus, Google Cloud Trace, or an external Grafana instance).

istiod manages its own CA. The built-in Citadel CA in istiod issues 24-hour SPIFFE certificates. There is no integration with Google Certificate Authority Service or external PKI by default. For compliance environments, configure an external CA.

Node pool has a fixed size of 2. No cluster autoscaler is configured. The node pool maintains exactly 2 preemptible nodes. If workloads require more capacity, update the node_count field and re-apply, or enable the cluster autoscaler manually after deployment.


Prerequisites

Before deploying the Istio GKE module, verify the following:

Google Cloud

  • A Google Cloud project with billing enabled
  • The following APIs enabled (the module enables them automatically on first run):
    • container.googleapis.com — GKE API
    • compute.googleapis.com — VPC, firewall, Cloud NAT
    • iam.googleapis.com — IAM roles and Workload Identity
    • cloudresourcemanager.googleapis.com — project metadata
# Check which APIs are enabled in your project
gcloud services list --enabled --filter="name:container OR name:compute OR name:iam"

Permissions

The identity running the module (user or service account) requires:

  • roles/container.admin — create and manage GKE clusters
  • roles/compute.networkAdmin — create VPC, subnets, firewall rules, Cloud NAT
  • roles/iam.serviceAccountAdmin — create node service accounts
  • roles/iam.workloadIdentityUser — bind Kubernetes service accounts
# Check your current IAM roles
gcloud projects get-iam-policy <PROJECT_ID> \
--flatten="bindings[].members" \
--filter="bindings.members:user:<YOUR_EMAIL>"

Local Tools

The following tools must be available in the environment running the module:

  • gcloud CLI (authenticated with gcloud auth login or gcloud auth activate-service-account)
  • kubectl (available via gcloud components install kubectl)
  • istioctl (downloaded by the module during installation)
  • helm (required only if using the Gateway API via Helm)
# Verify gcloud authentication
gcloud auth list

# Verify kubectl is installed
kubectl version --client

# Install kubectl via gcloud if missing
gcloud components install kubectl

Deploying the Module

Initial Deployment

  1. Authenticate to Google Cloud:
gcloud auth login
gcloud auth application-default login
gcloud config set project <YOUR_PROJECT_ID>
  1. Confirm the target region and cluster name in the module configuration.

  2. Apply the module. The deployment sequence is:

    • VPC and subnets created
    • GKE Standard cluster provisioned (~5–8 minutes)
    • Node pool added with autoscaling configured
    • Workload Identity configured on the cluster
    • istioctl downloaded and Istio installed into the cluster (~3–5 minutes)
    • Observability add-ons (Prometheus, Grafana, Kiali, Jaeger) deployed
  3. Configure kubectl to connect to the cluster:

# Get credentials for the new cluster
gcloud container clusters get-credentials <CLUSTER_NAME> --region <REGION>

# Verify the connection
kubectl get nodes
kubectl get pods -n istio-system
  1. Verify the Istio installation:
# Confirm all Istio components are running
istioctl verify-install

# Check the overall mesh health
istioctl analyze

# Confirm the Ingress Gateway has an external IP
kubectl get service istio-ingressgateway -n istio-system

Enrol Your First Application

After the cluster is running and Istio is verified:

# Label the namespace for sidecar injection (sidecar mode)
kubectl label namespace default istio-injection=enabled

# Or label for ambient mode
kubectl label namespace default istio.io/dataplane-mode=ambient

# Deploy a sample application
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/bookinfo/platform/kube/bookinfo.yaml

# Verify all pods are running with sidecars injected (sidecar mode: 2/2 containers per pod)
kubectl get pods -n default

# Access the application via the Ingress Gateway
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/bookinfo/networking/bookinfo-gateway.yaml

# Get the ingress gateway external IP
kubectl get service istio-ingressgateway -n istio-system \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Cleaning Up

To avoid ongoing charges, delete the cluster when finished:

# Delete the GKE cluster
gcloud container clusters delete <CLUSTER_NAME> --region <REGION>

# Confirm the cluster is deleted
gcloud container clusters list

Further Learning

Open Source Istio

GKE

Observability

Service Mesh Concepts


Common Issues and Variable Dependencies

Variables That Depend on Each Other

install_ambient_mesh selects between two mutually exclusive Istio installations: When install_ambient_mesh = false (default), null_resource.install_sidecar_mesh runs — it installs Istio with the default profile and enables sidecar injection in the default namespace. When install_ambient_mesh = true, null_resource.install_ambient_mesh runs instead — it installs Istio with the --set profile=ambient flag, labels the default namespace with istio.io/dataplane-mode=ambient, and applies a Waypoint proxy. Exactly one of the two null resources is created; setting both would require changing the count expressions.

create_cluster = false requires a pre-existing cluster named gke_cluster in region: When create_cluster = false, the module uses data.google_container_cluster.existing_cluster. If the cluster does not exist, the data source fails. No service account, node pool, or IAM resources are created when reusing an existing cluster.

deploy_application deploys the Istio Bookinfo app using istioctl and kubectl: The application deployment (null_resource in outputs.tf) is not shown explicitly — the lab guide covers this manually. The Bookinfo manifests are part of the downloaded Istio release bundle at $HOME/istio-<version>/samples/bookinfo/platform/kube/.

create_network = false requires network_name and subnet_name to identify an existing network: The subnet must have secondary IP ranges with names matching pod_ip_range and service_ip_range. The cluster requires these range names in its ip_allocation_policy.

enable_services must be true for a fresh project: The module only enables two APIs: cloudapis.googleapis.com and container.googleapis.com. If these are not enabled, the google_container_cluster resource will fail. The null_resource.wait_for_container_api polls until container.googleapis.com is confirmed active.

Variables That Affect Other Variables' Behavior

istio_version determines the download URL, Helm chart, and observability addon URLs: The sidecar and ambient install scripts download the Istio release from GitHub using var.istio_version as the version tag. The observability addons (Prometheus, Grafana, Jaeger, Kiali) are installed from raw.githubusercontent.com/istio/istio/release-<major.minor>/samples/addons/. Specifying an invalid version will cause the download to fail.

istio_version regex is evaluated at plan time: The expression regex("^(\\d+\\.\\d+)", var.istio_version)[0] extracts the major.minor portion. If the version string does not start with digits and a dot (e.g., if a prerelease tag like 1.24.2-rc1 is used), the regex will still match the leading 1.24 portion correctly. However, the release branch URL (release-1.24) must exist on GitHub.

GKE cluster uses LEGACY_DATAPATH: The cluster is created with datapath_provider = "LEGACY_DATAPATH" (kube-proxy mode, not eBPF/Dataplane V2). This is required for open-source Istio compatibility — Dataplane V2 (eBPF) conflicts with Istio's packet-interception model. Do not change this setting.

GKE cluster uses allow_net_admin = true: This grants the NET_ADMIN Linux capability to pods, which Istio's istio-init init container requires to configure iptables rules for traffic interception (sidecar mode). In ambient mode, ztunnel handles interception at the node level, but allow_net_admin is still set globally.

resource_creator_identity is used in gcloud container clusters get-credentials calls: Both the sidecar and ambient install scripts optionally pass --impersonate-service-account=<identity> to gcloud. If the runner is authenticated as a user (not a service account), this impersonation may succeed only if the user has iam.serviceAccounts.actAs on the target service account.

Common Pitfalls

  1. Istio download must succeed before installation: The install scripts use curl -fL ... | tar xz. If the GitHub release URL is unavailable or the version does not exist, the command fails and the Terraform apply aborts. The set -eo pipefail in the script ensures failures propagate. Verify the version exists at https://github.com/istio/istio/releases before applying.

  2. Node pool uses preemptible nodes: The Standard cluster node pool uses preemptible = true. Preemptible nodes can be reclaimed by GCP with 30 seconds notice. If a node is preempted during Istio installation, the install script may fail waiting for nodes to be ready. The --timeout=300s on kubectl wait provides a buffer.

  3. istioctl verify-install may report warnings: The script runs istioctl verify-install || echo "Warning..." — failures are non-fatal. Common warnings include missing CRDs for Istio components not deployed in the profile. These are normal and do not indicate a broken installation.

  4. Observability addon installation may fail silently: The addon install loop (for addon in prometheus jaeger grafana kiali) uses || echo "Warning: Failed..." — individual addon failures do not abort the script. Check kubectl get pods -n istio-system after installation to verify which addons are running.

  5. Sidecar mode labels the default namespace only: The install_sidecar_mesh script labels default with istio-injection=enabled. Other namespaces created after installation will not have sidecar injection unless explicitly labeled.

  6. Ambient mode applies a Waypoint proxy to the default namespace: The istioctl waypoint apply --namespace default command deploys a Waypoint Deployment. This Waypoint handles L7 traffic policies for the namespace. Without the Waypoint, AuthorizationPolicy and other L7 resources have no effect in ambient mode.

  7. Destroy provisioners are graceful (set +e): Both the sidecar and ambient destroy provisioners use set +e to prevent individual cleanup step failures from aborting the destroy. This means a failed Istio uninstall during destroy will log a warning but not block GKE cluster deletion.