PCDE Certification Preparation Guide: Section 3 — Migrate data solutions (~23% of the exam)

This guide covers Section 3 of the Professional Cloud Database Engineer (PCDE) exam. Be warned up front: this is the section where the RAD foundation modules cover the least ground. The modules implement the export/import (extended-outage) migration path end to end — App_CloudRun/App_GKE backup-import jobs fed from GCS or Google Drive, driven by the App_Common scripts — and read replication inside Google Cloud via Services_GCP. Database Migration Service, Datastream, zero-downtime cutovers, and reverse replication are concept-only and carry heavy "Beyond the modules" study lists. Deploy the relational-baseline and app-dataops profiles from the PCDE Lab Map before starting.

3.1 Design and implement data migration and replication

⏱ ~90 min (half hands-on, half reading) · 💰 low — one import job + GCS staging · ⚙️ Requires: app-dataops profile with enable_backup_import = true

Why the exam cares — Migration questions are downtime-budget questions. The decision tree the exam tests: extended outage acceptable → one-time export/import (dump file through GCS); near-zero downtime → continuous replication (Database Migration Service for homogeneous moves into Cloud SQL/AlloyDB, Datastream/CDC for heterogeneous) with a short cutover; fallback required → reverse replication from the new primary back to the source so you can return if the cutover fails. Heterogeneous moves add DDL/DML conversion (schema/type/dialect translation) on top. You must pick the tool and sequence the cutover: stop writes → drain replication lag → switch connection strings → verify → (optionally) reverse-replicate.

How RAD implements it — the extended-outage path is real and runnable:

Step of a lift-and-shift	Module implementation
Export from source	the export script (App_Common) — version-matched `pg_dump`/`mysqldump` producing `backup-<timestamp>.tar.gz`; for an external source you run the equivalent dump yourself
Stage the artifact	The module-provisioned GCS backup bucket (lifecycle-managed by `backup_retention_days`, default `7`), or Google Drive
Import into Cloud SQL	`enable_backup_import = true` (default `false`) runs a one-time `<service>-backup-import` job selected by `backup_source` (`gcs`/`gdrive`, default `gcs`), restoring `backup_file` (default `backup.sql`) with `backup_format` (`sql,tar,gz,tgz,tar.gz,zip,auto`) through the Cloud SQL connector volume
Post-import fix-ups (DDL/DML adjustments)	`enable_custom_sql_scripts` + `custom_sql_scripts_bucket`/`custom_sql_scripts_path` (+ `custom_sql_scripts_use_root` for privileged DDL) runs `.sql` files in lexicographic order — the place to apply converted schema objects, recreate sequences, or fix collations
Replication (inside GCP)	`create_postgres_read_replica` / `create_mysql_read_replica` in `Services_GCP` — Cloud SQL native async replication, the same mechanism a DMS migration uses for its destination sync, observable end to end

What the modules deliberately do not do: connect to an external source, run change data capture, or orchestrate a cutover. There is no DMS, Datastream, or external-server replication configuration anywhere in the four modules.

Try it

Simulate a source database: connect to the lab instance (Auth Proxy + psql as in Section 1.3), create a table with rows, and dump it — or simply let the scheduled export from Section 2.5 produce one. Stage your own file explicitly:
```
pg_dump -h 127.0.0.1 -U postgres -d postgres -f /tmp/source-dump.sql
gcloud storage cp /tmp/source-dump.sql gs://<backup-bucket>/source-dump.sql
```

In the portal set enable_backup_import = true, backup_source = "gcs", backup_file = "source-dump.sql", backup_format = "sql", and apply. Watch the import job:

gcloud run jobs executions list --job=<service>-backup-import --region=us-central1
gcloud logging read 'resource.type="cloud_run_job" AND resource.labels.job_name="<service>-backup-import"' \
  --limit=50 --freshness=1h

Verify row counts match the source (the migration engineer's first validation):

psql -h 127.0.0.1 -U <app-user> -d <db> -c "SELECT count(*) FROM <your_table>;"

For the replication half, apply ha-production (Section 1.2) and watch cloudsql-<prefix>-postgres-replica seed and catch up — gcloud sql instances describe cloudsql-<prefix>-postgres-replica --format="value(state, replicaConfiguration)". This is the same replica machinery DMS drives during a continuous migration.
You know it worked when the import execution succeeds and the destination row counts equal the source's.

Check yourself

Q1: A 2 TB on-premises PostgreSQL 14 database must move to Cloud SQL with under 5 minutes of downtime. Is the platform's import-job pattern appropriate? What is?

A: No — a dump/restore of 2 TB takes hours, all of it downtime (the pattern is right only when an extended outage is acceptable). Use Database Migration Service: initial snapshot plus continuous CDC replication from the source, let lag drain while the source stays live, then a minutes-long cutover. DMS homogeneous migrations to Cloud SQL are free, which the exam likes to mention.

Q2: After cutover to Cloud SQL, the business demands a fallback path for two weeks. What is the mechanism, and what must remain true at the source?

A: Reverse replication: replicate changes from the new Cloud SQL primary back to the old source (DMS supports configuring the old source as a replica of the migrated instance for PostgreSQL/MySQL, or you maintain logical replication yourself), so the application can be repointed back without data loss. The source must remain schema-compatible and reachable, and no writes may go to it directly during the fallback window — otherwise the two diverge.

Q3: An Oracle-to-PostgreSQL migration stalls because of incompatible PL/SQL and data types. Which class of work is this, and which tools address it?

A: DDL/DML conversion — heterogeneous migrations need schema and code translation, not just data movement. Tools: DMS's Oracle-to-PostgreSQL conversion workspaces (Ora2Pg-based), manual rewrite of stored procedures, plus type-mapping decisions (NUMBER → numeric, DATE → timestamp). In this platform the converted DDL would be applied through the custom-SQL-scripts job; the conversion itself is always engineering work the exam expects you to schedule before data sync.

Beyond the modules — Most of Section 3 lives here; budget real study time:

Database Migration Service (DMS): connection profiles, migration jobs, homogeneous (MySQL/PostgreSQL → Cloud SQL/AlloyDB, free) vs heterogeneous (Oracle/SQL Server → PostgreSQL, conversion workspaces). In a scratch project walk through gcloud database-migration connection-profiles create postgresql ... and gcloud database-migration migration-jobs create ... --type=CONTINUOUS, even if only to the validation step — the verify phase (gcloud database-migration migration-jobs verify) is exam-favored.
Datastream for CDC into BigQuery/GCS when the target is analytics rather than a like-for-like database.
Cloud SQL external server replication ("Replicating from an external server" docs): the pre-DMS pattern of making a Cloud SQL instance a replica of an external primary; promotion = cutover (gcloud sql instances promote-replica).
Zero-downtime sequencing: dual-write pitfalls, draining lag before cutover, connection-string switching via Secret Manager (this platform's secret-driven DB_HOST/host secrets show exactly where you would flip it).
Validation tooling: the open-source Data Validation Tool (DVT) for row/aggregate comparison between source and target.

⚠️ Exam trap — "Use the import job / gcloud sql import sql for the migration" is the trap answer whenever the scenario states a downtime budget in minutes. Conversely, "set up DMS" is the trap when the scenario says a weekend outage is fine and the database is small — one-time export/import is simpler, cheaper, and exactly what this platform automates.

3.1 Design and implement data migration and replication​

3.1 Design and implement data migration and replication