Deployment Runbook
This runbook covers both the automated CI/CD pipeline and manual fallback procedures for deploying Gatelithix Gateway services.
Automated Deployment (CI/CD)
The primary deployment path is fully automated via GitHub Actions.
Pipeline Overview
push to main
-> build-push.yml (Build & Push 8 Images including migrate)
-> deploy-core.yml (migrate-core job -> Gateway deploy -> Health Check)
-> deploy-pci.yml (migrate-vault job -> Vault deploy -> Health Check)Every push to main triggers a fully automated pipeline: build 8 container images (including the migration image), run database migrations via Cloud Run Jobs, deploy services, and verify health. No manual steps required.
Build Stage (build-push.yml)
Triggered on push to main. Builds all 8 service container images in parallel:
| Service | Build Tool | Registry | Path |
|---|---|---|---|
| Gateway | ko | Core Artifact Registry | ./apps/gateway/ |
| Vault | ko | PCI Artifact Registry | ./apps/vault/ |
| Migrate | Docker | Core + PCI Artifact Registry | db/Dockerfile.migrate |
| Stripe Connector | ko | Core Artifact Registry | ./apps/connectors/stripe/cmd/ |
| NMI Connector | ko | Core Artifact Registry | ./apps/connectors/nmi/cmd/ |
| FluidPay Connector | ko | Core Artifact Registry | ./apps/connectors/fluidpay/cmd/ |
| Admin Portal | Docker | Core Artifact Registry | apps/admin/ |
| Docs Site | Docker | Core Artifact Registry | apps/docs/ |
All images are tagged with both the commit SHA and latest. The migration image is pushed to both registries (core and PCI) to maintain CDE isolation — each project pulls only from its own registry.
Deploy Stage
After build-push.yml completes successfully, two deploy workflows fire in parallel:
deploy-core.yml deploys the API Gateway to Cloud Run:
- Environment:
nonpci-prod - Service:
api-gateway - Region:
us-central1 - Authentication: Workload Identity Federation (no service account keys)
deploy-pci.yml deploys the Token Vault to Cloud Run:
- Environment:
pci-prod(requires 2 reviewers + 5-min wait timer) - Service:
token-vault - Region:
us-central1 - Authentication: Workload Identity Federation (no service account keys)
Database Migrations
Migrations run automatically before service deployment using goose inside Cloud Run Jobs. The deploy workflows:
- Execute the
migrate-coreormigrate-vaultCloud Run Job with the commit SHA image tag - The job connects to Cloud SQL via VPC connector (private IP, no Auth Proxy needed)
- Database password is injected from Secret Manager at runtime
- Goose runs
upto apply all pending migrations, thenstatusto verify - The deploy workflow waits for the job to complete before proceeding to service deployment
Core migrations (db/migrations/core/) are applied by migrate-core Cloud Run Job in deploy-core.yml against gatelithix-core-pg using the gateway-app user.
PCI vault migrations (db/migrations/vault/) are applied by migrate-vault Cloud Run Job in deploy-pci.yml against gatelithix-pci-pg using the vault-app user.
No manual migration step is needed. Migrations are applied on every deploy and goose ensures only pending migrations run.
Checking Migration Status
# List recent migration job executions for core
gcloud run jobs executions list --job=migrate-core --project=gatelithix-core --region=us-central1
# List recent migration job executions for PCI vault
gcloud run jobs executions list --job=migrate-vault --project=gatelithix-pci --region=us-central1Manually Re-Running Migrations
If a migration needs to be re-run (e.g., after a transient database connectivity issue):
# Re-run core migrations
gcloud run jobs execute migrate-core --project=gatelithix-core --region=us-central1 --wait
# Re-run PCI vault migrations
gcloud run jobs execute migrate-vault --project=gatelithix-pci --region=us-central1 --waitThe --wait flag blocks until the job completes, showing success or failure inline.
Post-Deploy Verification
After deployment, each workflow runs an automated health check:
# Waits for the new Cloud Run revision, then hits /health with retries
curl -f --retry 5 --retry-delay 10 "$SERVICE_URL/health"If the health check fails, a GitHub Actions warning is emitted but the workflow does not fail (the service may still be starting). For deeper verification, run the smoke test:
scripts/smoke-test.shThis verifies health endpoints return 200 and key API routes are responsive.
Vault readiness probe: The vault’s
/health/readyendpoint now checks PCI database connectivity. If the vault service showsHEALTH_CHECK_FAILUREin Cloud Run, check PCI Cloud SQL connectivity first (Auth Proxy status, IAM permissions, VPC peering). A 503 from/health/readymeans the PCI database is unreachable and Cloud Run will stop routing traffic to that instance.
Manual Fallback (When CI Unavailable)
Use this procedure for first-time setup or when the CI/CD pipeline is unavailable.
Prerequisites
gcloudCLI authenticated with appropriate permissions- Terraform 1.5+
- goose for database migrations
- ko v0.18+ for Go container builds
- Go 1.26.1+
Step 1: Apply Terraform Infrastructure
# Core project
cd infra/terraform/core
terraform init
terraform plan -out=tfplan
terraform apply tfplan
# PCI project
cd ../pci
terraform init
terraform plan -out=tfplan
terraform apply tfplanReview the plan output carefully before applying. Core must be applied before PCI (PCI depends on core VPC peering outputs).
Step 2: Run Database Migrations
Connect to Cloud SQL via Auth Proxy:
# Start Auth Proxy for core database
cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db \
--port 5432 &
# Run core migrations
goose -dir db/migrations/core postgres \
"host=127.0.0.1 port=5432 user=gateway-sa dbname=gateway sslmode=disable" up
# Start Auth Proxy for PCI database
cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db \
--port 5433 &
# Run PCI migrations
goose -dir db/migrations/pci postgres \
"host=127.0.0.1 port=5433 user=vault-sa dbname=vault sslmode=disable" upStep 3: Build and Push Container Images
# Core services
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway \
ko build ./apps/gateway/ --bare --tags=$(git rev-parse HEAD),latest
# PCI services
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault \
ko build ./apps/vault/ --bare --tags=$(git rev-parse HEAD),latest
# Repeat for connectors
for connector in stripe nmi fluidpay; do
KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/${connector}-connector \
ko build ./apps/connectors/${connector}/cmd/ --bare --tags=$(git rev-parse HEAD),latest
doneStep 4: Deploy Cloud Run Services
# Gateway
gcloud run deploy api-gateway \
--image us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway:$(git rev-parse HEAD) \
--region us-central1 \
--project gatelithix-core
# Vault
gcloud run deploy vault \
--image us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault:$(git rev-parse HEAD) \
--region us-central1 \
--project gatelithix-pciStep 5: Verify Deployment
# Health checks
curl -s https://api.gatelithix.com/health | jq .
curl -s https://api.gatelithix.com/health/ready | jq .
# Smoke test
scripts/smoke-test.shRollback
Cloud Run Revision Rollback
Cloud Run maintains previous revisions. To roll back:
# List revisions
gcloud run revisions list --service api-gateway \
--region us-central1 --project gatelithix-core
# Route 100% traffic to previous revision
gcloud run services update-traffic api-gateway \
--to-revisions PREVIOUS_REVISION=100 \
--region us-central1 --project gatelithix-coreDatabase Migration Rollback
Rolling back migrations requires local access via Cloud SQL Auth Proxy since Cloud Run Jobs only run goose up:
# Start Cloud SQL Auth Proxy for core database
cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db --port 5432 &
# Roll back the most recent core migration
goose -dir db/migrations/core postgres \
"host=127.0.0.1 port=5432 user=gateway-app dbname=gateway password=$CORE_DB_PASSWORD sslmode=disable" down
# Start Cloud SQL Auth Proxy for PCI database
cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db --port 5433 &
# Roll back the most recent PCI migration
goose -dir db/migrations/vault postgres \
"host=127.0.0.1 port=5433 user=vault-app dbname=vault password=$PCI_DB_PASSWORD sslmode=disable" downNote: Retrieve database passwords from Secret Manager before running:
export CORE_DB_PASSWORD=$(gcloud secrets versions access latest --secret=core-db-password --project=gatelithix-core) export PCI_DB_PASSWORD=$(gcloud secrets versions access latest --secret=pci-db-password --project=gatelithix-pci)
Always verify the rollback migration SQL before running. Some migrations may not be reversible (e.g., data transformations).
Environment Configuration
GitHub Environments
| Environment | Branch | Reviewers | Wait Timer |
|---|---|---|---|
nonpci-staging | develop | 0 | None |
nonpci-prod | main | 1 | None |
pci-staging | develop | 1 (PCI team) | None |
pci-prod | main | 2 (PCI team) | 5 minutes |
Required Secrets
| Secret | Description |
|---|---|
WIF_PROVIDER | Workload Identity Federation provider name |
CORE_SA_EMAIL | Core deployer service account email |
PCI_SA_EMAIL | PCI deployer service account email |
Required Variables
| Variable | Description |
|---|---|
CORE_ARTIFACT_REGISTRY | Core Artifact Registry URI |
PCI_ARTIFACT_REGISTRY | PCI Artifact Registry URI |
NEXT_PUBLIC_API_URL | Admin portal API URL |
NEXT_PUBLIC_AUTH0_DOMAIN | Auth0 domain |
NEXT_PUBLIC_AUTH0_CLIENT_ID | Auth0 client ID |
NEXT_PUBLIC_AUTH0_AUDIENCE | Auth0 audience |