Deployment Runbook
This runbook covers both the automated CI/CD pipeline and manual fallback procedures for deploying Gatelithix Gateway services.
Automated Deployment (CI/CD)
The primary deployment path is fully automated via GitHub Actions.
Pipeline Overview
push to main
-> build-push.yml (Build & Push 8 Images including migrate)
-> deploy-core.yml (migrate-core job -> Gateway deploy -> Health Check)
-> deploy-pci.yml (migrate-vault job -> Vault deploy -> Health Check)Every push to main triggers a fully automated pipeline: build 8 container images (including the migration image), run database migrations via Cloud Run Jobs, deploy services, and verify health. No manual steps required.
Build Stage (build-push.yml)
Triggered on push to main. Builds all 8 service container images in parallel:
| Service | Build Tool | Registry | Path |
|---|---|---|---|
| Gateway | ko | Core Artifact Registry | ./apps/gateway/ |
| Vault | ko | PCI Artifact Registry | ./apps/vault/ |
| Migrate | Docker | Core + PCI Artifact Registry | db/Dockerfile.migrate |
| Stripe Connector | ko | Core Artifact Registry | ./apps/connectors/stripe/cmd/ |
| NMI Connector | ko | Core Artifact Registry | ./apps/connectors/nmi/cmd/ |
| FluidPay Connector | ko | Core Artifact Registry | ./apps/connectors/fluidpay/cmd/ |
| Dashboard | Docker | Core Artifact Registry | apps/dashboard/ |
| Docs Site | Docker | Core Artifact Registry | apps/docs/ |
All images are tagged with both the commit SHA and latest. The migration image is pushed to both registries (core and PCI) to maintain CDE isolation — each project pulls only from its own registry.
Deploy Stage
After build-push.yml completes successfully, two deploy workflows fire in parallel:
deploy-core.yml deploys the API Gateway to Cloud Run:
- Environment:
nonpci-prod - Service:
api-gateway - Region:
us-central1 - Authentication: Workload Identity Federation (no service account keys)
deploy-pci.yml deploys the Token Vault to Cloud Run:
- Environment:
pci-prod(requires 2 reviewers + 5-min wait timer) - Service:
token-vault - Region:
us-central1 - Authentication: Workload Identity Federation (no service account keys)
Database Migrations
Migrations run automatically before service deployment using goose inside Cloud Run Jobs. The deploy workflows:
- Execute the
migrate-coreormigrate-vaultCloud Run Job with the commit SHA image tag - The job connects to Cloud SQL via VPC connector (private IP, no Auth Proxy needed)
- Database password is injected from Secret Manager at runtime
- Goose runs
upto apply all pending migrations, thenstatusto verify - The deploy workflow waits for the job to complete before proceeding to service deployment
Core migrations (db/migrations/) are applied by migrate-core Cloud Run Job in deploy-core.yml against gatelithix-core-pg using the gateway-app user.
PCI vault migrations (db/migrations/vault/) are applied by migrate-vault Cloud Run Job in deploy-pci.yml against gatelithix-pci-pg using the vault-app user.
No manual migration step is needed. Migrations are applied on every deploy and goose ensures only pending migrations run.
Checking Migration Status
# List recent migration job executions for core
gcloud run jobs executions list --job=migrate-core --project=gatelithix-core --region=us-central1
# List recent migration job executions for PCI vault
gcloud run jobs executions list --job=migrate-vault --project=gatelithix-pci --region=us-central1Manually Re-Running Migrations
If a migration needs to be re-run (e.g., after a transient database connectivity issue):
# Re-run core migrations
gcloud run jobs execute migrate-core --project=gatelithix-core --region=us-central1 --wait
# Re-run PCI vault migrations
gcloud run jobs execute migrate-vault --project=gatelithix-pci --region=us-central1 --waitThe --wait flag blocks until the job completes, showing success or failure inline.
Post-Deploy Verification
After deployment, each workflow runs an automated health check:
# Waits for the new Cloud Run revision, then hits /health with retries
curl -f --retry 5 --retry-delay 10 "$SERVICE_URL/health"If the health check fails, a GitHub Actions warning is emitted but the workflow does not fail (the service may still be starting). For deeper verification, run the smoke test:
scripts/smoke-test.shThis verifies health endpoints return 200 and key API routes are responsive.
Vault readiness probe: The vault’s
/health/readyendpoint now checks PCI database connectivity. If the vault service showsHEALTH_CHECK_FAILUREin Cloud Run, check PCI Cloud SQL connectivity first (Auth Proxy status, IAM permissions, VPC peering). A 503 from/health/readymeans the PCI database is unreachable and Cloud Run will stop routing traffic to that instance.
Manual Fallback (When CI Unavailable)
Use this procedure for first-time setup or when the CI/CD pipeline is unavailable.
Prerequisites
gcloudCLI authenticated with appropriate permissions- Terraform 1.5+
- goose for database migrations
- ko v0.18+ for Go container builds
- Go 1.26.1+
Step 1: Apply Terraform Infrastructure
# Core project
cd infra/terraform/core
terraform init
terraform plan -out=tfplan
terraform apply tfplan
# PCI project
cd ../pci
terraform init
terraform plan -out=tfplan
terraform apply tfplanReview the plan output carefully before applying. Core must be applied before PCI (PCI depends on core VPC peering outputs).
Step 2: Run Database Migrations
Connect to Cloud SQL via Auth Proxy:
# Start Auth Proxy for core database
cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db \
--port 5432 &
# Run core migrations
goose -dir db/migrations/ postgres \
"host=127.0.0.1 port=5432 user=gateway-sa dbname=gateway sslmode=disable" up
# Start Auth Proxy for PCI database
cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db \
--port 5433 &
# Run PCI migrations
goose -dir db/migrations/pci postgres \
"host=127.0.0.1 port=5433 user=vault-sa dbname=vault sslmode=disable" upStep 3: Build and Push Container Images
# Core services
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway \
ko build ./apps/gateway/ --bare --tags=$(git rev-parse HEAD),latest
# PCI services
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault \
ko build ./apps/vault/ --bare --tags=$(git rev-parse HEAD),latest
# Repeat for connectors
for connector in stripe nmi fluidpay; do
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/${connector}-connector \
ko build ./apps/connectors/${connector}/cmd/ --bare --tags=$(git rev-parse HEAD),latest
doneStep 4: Deploy Cloud Run Services
# Gateway
gcloud run deploy api-gateway \
--image us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway:$(git rev-parse HEAD) \
--region us-central1 \
--project gatelithix-core
# Vault
gcloud run deploy vault \
--image us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault:$(git rev-parse HEAD) \
--region us-central1 \
--project gatelithix-pciStep 5: Verify Deployment
# Health checks
curl -s https://api.gatelithix.com/health | jq .
curl -s https://api.gatelithix.com/health/ready | jq .
# Smoke test
scripts/smoke-test.shCloud SQL Auth Proxy Sidecar
Both the gateway and vault Cloud Run services use a Cloud SQL Auth Proxy sidecar container for database connectivity. This is Google’s recommended approach for Cloud Run + Cloud SQL.
How It Works
Each Cloud Run service template includes two containers:
app— the gateway or vault binary, connects tolocalhost:5432cloud-sql-proxy— handles IAM authentication and private IP routing to Cloud SQL
The proxy sidecar starts first (the app container depends_on it), listens on localhost:5432, and transparently proxies connections to the Cloud SQL instance using IAM auth over the VPC private IP.
Benefits
- No Go Connector dependency at runtime — the app uses a simple
localhost:5432TCP connection - IAM authentication handled externally — no token exchange latency in the app’s startup path
- Consistent with local development — same connection path (
host:port) in both environments
Startup Sequence
The gateway and vault both use a deferred startup pattern:
- HTTP server binds the port immediately (Cloud Run sees the container is alive)
/healthreturns 200 right away/health/readyreturns 503 while the database connects- Once the DB is connected and all dependencies are wired, the full router swaps in
/health/readyreturns 200 and Cloud Run begins routing traffic
This prevents container crashes during slow database connections and ensures all errors are captured in Cloud Logging.
Rollback
Cloud Run Revision Rollback
Cloud Run maintains previous revisions. To roll back:
# List revisions
gcloud run revisions list --service api-gateway \
--region us-central1 --project gatelithix-core
# Route 100% traffic to previous revision
gcloud run services update-traffic api-gateway \
--to-revisions PREVIOUS_REVISION=100 \
--region us-central1 --project gatelithix-coreDatabase Migration Rollback
Rolling back migrations requires local access via Cloud SQL Auth Proxy since Cloud Run Jobs only run goose up:
# Start Cloud SQL Auth Proxy for core database
cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db --port 5432 &
# Roll back the most recent core migration
goose -dir db/migrations/ postgres \
"host=127.0.0.1 port=5432 user=gateway-app dbname=gateway password=$CORE_DB_PASSWORD sslmode=disable" down
# Start Cloud SQL Auth Proxy for PCI database
cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db --port 5433 &
# Roll back the most recent PCI migration
goose -dir db/migrations/vault postgres \
"host=127.0.0.1 port=5433 user=vault-app dbname=vault password=$PCI_DB_PASSWORD sslmode=disable" downNote: Retrieve database passwords from Secret Manager before running:
export CORE_DB_PASSWORD=$(gcloud secrets versions access latest --secret=core-db-password --project=gatelithix-core) export PCI_DB_PASSWORD=$(gcloud secrets versions access latest --secret=pci-db-password --project=gatelithix-pci)
Always verify the rollback migration SQL before running. Some migrations may not be reversible (e.g., data transformations).
Environment Configuration
GitHub Environments
| Environment | Branch | Reviewers | Wait Timer |
|---|---|---|---|
nonpci-staging | develop | 0 | None |
nonpci-prod | main | 1 | None |
pci-staging | develop | 1 (PCI team) | None |
pci-prod | main | 2 (PCI team) | 5 minutes |
Required Secrets
| Secret | Description |
|---|---|
WIF_PROVIDER | Workload Identity Federation provider name |
CORE_SA_EMAIL | Core deployer service account email |
PCI_SA_EMAIL | PCI deployer service account email |
Required Variables
| Variable | Description |
|---|---|
CORE_ARTIFACT_REGISTRY | Core Artifact Registry URI |
PCI_ARTIFACT_REGISTRY | PCI Artifact Registry URI |
NEXT_PUBLIC_API_URL | Admin portal API URL |
NEXT_PUBLIC_AUTH0_DOMAIN | Auth0 domain |
NEXT_PUBLIC_AUTH0_CLIENT_ID | Auth0 client ID |
NEXT_PUBLIC_AUTH0_AUDIENCE | Auth0 audience |