Skip to Content
OperationsDeployment Runbook

Deployment Runbook

This runbook covers both the automated CI/CD pipeline and manual fallback procedures for deploying Gatelithix Gateway services.

Automated Deployment (CI/CD)

The primary deployment path is fully automated via GitHub Actions.

Pipeline Overview

push to main -> build-push.yml (Build & Push 8 Images including migrate) -> deploy-core.yml (migrate-core job -> Gateway deploy -> Health Check) -> deploy-pci.yml (migrate-vault job -> Vault deploy -> Health Check)

Every push to main triggers a fully automated pipeline: build 8 container images (including the migration image), run database migrations via Cloud Run Jobs, deploy services, and verify health. No manual steps required.

Build Stage (build-push.yml)

Triggered on push to main. Builds all 8 service container images in parallel:

ServiceBuild ToolRegistryPath
GatewaykoCore Artifact Registry./apps/gateway/
VaultkoPCI Artifact Registry./apps/vault/
MigrateDockerCore + PCI Artifact Registrydb/Dockerfile.migrate
Stripe ConnectorkoCore Artifact Registry./apps/connectors/stripe/cmd/
NMI ConnectorkoCore Artifact Registry./apps/connectors/nmi/cmd/
FluidPay ConnectorkoCore Artifact Registry./apps/connectors/fluidpay/cmd/
Admin PortalDockerCore Artifact Registryapps/admin/
Docs SiteDockerCore Artifact Registryapps/docs/

All images are tagged with both the commit SHA and latest. The migration image is pushed to both registries (core and PCI) to maintain CDE isolation — each project pulls only from its own registry.

Deploy Stage

After build-push.yml completes successfully, two deploy workflows fire in parallel:

deploy-core.yml deploys the API Gateway to Cloud Run:

  • Environment: nonpci-prod
  • Service: api-gateway
  • Region: us-central1
  • Authentication: Workload Identity Federation (no service account keys)

deploy-pci.yml deploys the Token Vault to Cloud Run:

  • Environment: pci-prod (requires 2 reviewers + 5-min wait timer)
  • Service: token-vault
  • Region: us-central1
  • Authentication: Workload Identity Federation (no service account keys)

Database Migrations

Migrations run automatically before service deployment using goose  inside Cloud Run Jobs. The deploy workflows:

  1. Execute the migrate-core or migrate-vault Cloud Run Job with the commit SHA image tag
  2. The job connects to Cloud SQL via VPC connector (private IP, no Auth Proxy needed)
  3. Database password is injected from Secret Manager at runtime
  4. Goose runs up to apply all pending migrations, then status to verify
  5. The deploy workflow waits for the job to complete before proceeding to service deployment

Core migrations (db/migrations/core/) are applied by migrate-core Cloud Run Job in deploy-core.yml against gatelithix-core-pg using the gateway-app user.

PCI vault migrations (db/migrations/vault/) are applied by migrate-vault Cloud Run Job in deploy-pci.yml against gatelithix-pci-pg using the vault-app user.

No manual migration step is needed. Migrations are applied on every deploy and goose ensures only pending migrations run.

Checking Migration Status

# List recent migration job executions for core gcloud run jobs executions list --job=migrate-core --project=gatelithix-core --region=us-central1 # List recent migration job executions for PCI vault gcloud run jobs executions list --job=migrate-vault --project=gatelithix-pci --region=us-central1

Manually Re-Running Migrations

If a migration needs to be re-run (e.g., after a transient database connectivity issue):

# Re-run core migrations gcloud run jobs execute migrate-core --project=gatelithix-core --region=us-central1 --wait # Re-run PCI vault migrations gcloud run jobs execute migrate-vault --project=gatelithix-pci --region=us-central1 --wait

The --wait flag blocks until the job completes, showing success or failure inline.

Post-Deploy Verification

After deployment, each workflow runs an automated health check:

# Waits for the new Cloud Run revision, then hits /health with retries curl -f --retry 5 --retry-delay 10 "$SERVICE_URL/health"

If the health check fails, a GitHub Actions warning is emitted but the workflow does not fail (the service may still be starting). For deeper verification, run the smoke test:

scripts/smoke-test.sh

This verifies health endpoints return 200 and key API routes are responsive.

Vault readiness probe: The vault’s /health/ready endpoint now checks PCI database connectivity. If the vault service shows HEALTH_CHECK_FAILURE in Cloud Run, check PCI Cloud SQL connectivity first (Auth Proxy status, IAM permissions, VPC peering). A 503 from /health/ready means the PCI database is unreachable and Cloud Run will stop routing traffic to that instance.


Manual Fallback (When CI Unavailable)

Use this procedure for first-time setup or when the CI/CD pipeline is unavailable.

Prerequisites

  • gcloud CLI authenticated with appropriate permissions
  • Terraform 1.5+
  • goose  for database migrations
  • ko  v0.18+ for Go container builds
  • Go 1.26.1+

Step 1: Apply Terraform Infrastructure

# Core project cd infra/terraform/core terraform init terraform plan -out=tfplan terraform apply tfplan # PCI project cd ../pci terraform init terraform plan -out=tfplan terraform apply tfplan

Review the plan output carefully before applying. Core must be applied before PCI (PCI depends on core VPC peering outputs).

Step 2: Run Database Migrations

Connect to Cloud SQL via Auth Proxy:

# Start Auth Proxy for core database cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db \ --port 5432 & # Run core migrations goose -dir db/migrations/core postgres \ "host=127.0.0.1 port=5432 user=gateway-sa dbname=gateway sslmode=disable" up # Start Auth Proxy for PCI database cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db \ --port 5433 & # Run PCI migrations goose -dir db/migrations/pci postgres \ "host=127.0.0.1 port=5433 user=vault-sa dbname=vault sslmode=disable" up

Step 3: Build and Push Container Images

# Core services KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway \ ko build ./apps/gateway/ --bare --tags=$(git rev-parse HEAD),latest # PCI services KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault \ ko build ./apps/vault/ --bare --tags=$(git rev-parse HEAD),latest # Repeat for connectors for connector in stripe nmi fluidpay; do KO_DOCKER_REPO=us-central1-docker.pkg.dev/gatelithix-core/gatelithix/${connector}-connector \ ko build ./apps/connectors/${connector}/cmd/ --bare --tags=$(git rev-parse HEAD),latest done

Step 4: Deploy Cloud Run Services

# Gateway gcloud run deploy api-gateway \ --image us-central1-docker.pkg.dev/gatelithix-core/gatelithix/gateway:$(git rev-parse HEAD) \ --region us-central1 \ --project gatelithix-core # Vault gcloud run deploy vault \ --image us-central1-docker.pkg.dev/gatelithix-pci/gatelithix/vault:$(git rev-parse HEAD) \ --region us-central1 \ --project gatelithix-pci

Step 5: Verify Deployment

# Health checks curl -s https://api.gatelithix.com/health | jq . curl -s https://api.gatelithix.com/health/ready | jq . # Smoke test scripts/smoke-test.sh

Rollback

Cloud Run Revision Rollback

Cloud Run maintains previous revisions. To roll back:

# List revisions gcloud run revisions list --service api-gateway \ --region us-central1 --project gatelithix-core # Route 100% traffic to previous revision gcloud run services update-traffic api-gateway \ --to-revisions PREVIOUS_REVISION=100 \ --region us-central1 --project gatelithix-core

Database Migration Rollback

Rolling back migrations requires local access via Cloud SQL Auth Proxy since Cloud Run Jobs only run goose up:

# Start Cloud SQL Auth Proxy for core database cloud-sql-proxy gatelithix-core:us-central1:gatelithix-core-db --port 5432 & # Roll back the most recent core migration goose -dir db/migrations/core postgres \ "host=127.0.0.1 port=5432 user=gateway-app dbname=gateway password=$CORE_DB_PASSWORD sslmode=disable" down # Start Cloud SQL Auth Proxy for PCI database cloud-sql-proxy gatelithix-pci:us-central1:gatelithix-pci-db --port 5433 & # Roll back the most recent PCI migration goose -dir db/migrations/vault postgres \ "host=127.0.0.1 port=5433 user=vault-app dbname=vault password=$PCI_DB_PASSWORD sslmode=disable" down

Note: Retrieve database passwords from Secret Manager before running:

export CORE_DB_PASSWORD=$(gcloud secrets versions access latest --secret=core-db-password --project=gatelithix-core) export PCI_DB_PASSWORD=$(gcloud secrets versions access latest --secret=pci-db-password --project=gatelithix-pci)

Always verify the rollback migration SQL before running. Some migrations may not be reversible (e.g., data transformations).


Environment Configuration

GitHub Environments

EnvironmentBranchReviewersWait Timer
nonpci-stagingdevelop0None
nonpci-prodmain1None
pci-stagingdevelop1 (PCI team)None
pci-prodmain2 (PCI team)5 minutes

Required Secrets

SecretDescription
WIF_PROVIDERWorkload Identity Federation provider name
CORE_SA_EMAILCore deployer service account email
PCI_SA_EMAILPCI deployer service account email

Required Variables

VariableDescription
CORE_ARTIFACT_REGISTRYCore Artifact Registry URI
PCI_ARTIFACT_REGISTRYPCI Artifact Registry URI
NEXT_PUBLIC_API_URLAdmin portal API URL
NEXT_PUBLIC_AUTH0_DOMAINAuth0 domain
NEXT_PUBLIC_AUTH0_CLIENT_IDAuth0 client ID
NEXT_PUBLIC_AUTH0_AUDIENCEAuth0 audience