PSA/ee/docs/plans/2025-07-27-temporal-worker-production-deployment.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

14 KiB

Temporal Worker Stack Production Deployment Plan

Intro / Rationale

This plan outlines the deployment of the temporal worker service to the alga-psa hosted production environment. The temporal worker is a critical component that handles asynchronous workflows including tenant provisioning, user management, email notifications, and checkout session processing.

The deployment requires:

  • Integration with existing alga-psa infrastructure (database, secrets, vault)
  • Proper scaling configuration for production workloads
  • Secure API key management through Vault agent injection
  • Alignment with existing deployment patterns using Argo workflows

Success Criteria:

  • Temporal worker running as a scalable service in production
  • All required secrets properly injected from Vault
  • Health checks and monitoring in place
  • Successful integration with existing alga-psa services
  • Zero downtime deployment capability

Key Stakeholders:

  • DevOps team (deployment and infrastructure)
  • Backend team (temporal workflow functionality)
  • Security team (secrets management)

Phased Implementation Checklist

Phase 1: Infrastructure Preparation and Secret Setup

  • Create Vault secret for INTERNAL_API_SHARED_SECRET
    • Generate secure random API key (minimum 32 characters)
    • Store in Vault at path: secret/data/alga-psa/temporal-worker
    • Include key: internal_api_shared_secret
  • Ensure ALGA_AUTH_KEY exists in shared secrets
    • Verify key exists at: secret/data/alga-psa/shared
    • Include key: alga_auth_key
  • Create Vault policy for temporal-worker service account
    • Grant read access to secret/data/alga-psa/temporal-worker/*
    • Grant read access to existing alga-psa secrets path
  • Configure Kubernetes service account for temporal-worker
    • Create service account in target namespace (likely msp or dedicated temporal namespace)
    • Bind Vault policy to service account
  • Verify Vault agent injector is installed and configured in cluster
    • Check vault-agent-injector deployment status
    • Verify webhook configuration

Phase 2: Helm Chart Development

  • Create temporal-worker Helm templates
    • Copy and adapt deployment template from main alga-psa deployment
    • Create helm/templates/temporal-worker/deployment.yaml
    • Create helm/templates/temporal-worker/service.yaml
    • Create helm/templates/temporal-worker/configmap.yaml
    • Create helm/templates/temporal-worker/hpa.yaml for autoscaling
    • Create helm/templates/temporal-worker/pdb.yaml for pod disruption budget
    • Create helm/templates/temporal-worker/serviceaccount.yaml
    • Create helm/templates/temporal-worker/secrets.yaml for local development
  • Add temporal-worker configuration to values files
    • Update helm/values.yaml with default temporal worker settings
    • Update nm-kube-config/alga-psa/hosted.values.yaml with production overrides
  • Configure Vault agent injection annotations
    • Add vault.hashicorp.com/agent-inject annotations
    • Configure secret paths for all required secrets
    • Set up secret templates for environment variable format
  • Add temporal worker image configuration
    • Configure image repository path
    • Set up image pull secrets if using private registry

Phase 3: Build and Registry Setup

  • Create temporal-worker build workflow in Argo
    • Create workflows/build/temporal-worker-build-workflow.yaml
    • Follow buildx pattern from alga-psa-ci-cd-workflow.yaml
    • Configure docker-in-docker sidecar with buildx
    • Set up buildx builder with persistent cache
    • Configure multi-platform builds (linux/amd64)
  • Configure Harbor registry authentication
    • Use harbor-credentials secret
    • Set up registry authentication in buildx
    • Configure push to harbor.nineminds.com/nineminds/temporal-worker
  • Implement buildx cache strategy
    • Create node-specific buildx cache PVC
    • Configure cache-from and cache-to options
    • Use local cache type for persistence
  • Build and push initial temporal-worker image
    • Tag with git commit SHA
    • Push to Harbor registry
    • Verify image accessibility from cluster

Phase 4: Argo Workflow Integration

  • Create temporal-worker deployment workflow
    • Create workflows/deploy/temporal-worker-deploy-workflow.yaml
    • Include steps for:
      • Cloning repositories
      • Updating image tags in values
      • Running Helm deployment
      • Health check verification
  • Update composite workflows
    • Create alga-psa-build-migrate-deploy-with-temporal.yaml to include temporal-worker
    • Add conditional deployment based on changes to ee/temporal-workflows
  • Create rollback workflow
    • Implement automated rollback on deployment failure
    • Include health check validation
  • Test workflows in staging environment

Phase 5: Database and Network Configuration

  • Verify database connectivity in msp namespace
    • All services in msp namespace have network access by default
    • Confirm PostgreSQL cluster endpoint from alga-psa config
    • Verify connection strings match alga-psa pattern
  • Configure Temporal server connectivity
    • Verify temporal-frontend service is accessible
    • Ensure correct namespace and ports
    • Test connection from within msp namespace
  • Set up Redis access (if needed for caching)
    • Verify Redis service accessibility from msp namespace
    • Configure connection parameters

Phase 6: Deployment and Validation

  • Deploy to staging environment first
    • Run deployment workflow with staging parameters
    • Verify all pods start successfully
    • Check secret injection logs
    • Validate health endpoints
  • Run integration tests
    • Test tenant provisioning workflow
    • Test email sending functionality
    • Verify checkout session handling
    • Check activity timeout handling
  • Monitor resource usage
    • Observe CPU and memory consumption
    • Adjust resource requests/limits
    • Configure HPA thresholds
  • Deploy to production
    • Execute production deployment workflow
    • Monitor deployment progress
    • Verify zero downtime
    • Check all health endpoints

Phase 7: Monitoring and Observability

  • Configure logging
    • Ensure logs are collected by cluster logging solution
    • Set appropriate log levels for production
    • Configure structured logging format
  • Set up metrics collection
    • Export Temporal worker metrics
    • Configure Prometheus scraping
    • Create Grafana dashboards
  • Configure alerting
    • Set up alerts for worker health
    • Configure alerts for workflow failures
    • Set up PagerDuty integration
  • Document runbooks
    • Create troubleshooting guide
    • Document common issues and resolutions
    • Include rollback procedures

Background Details / Investigation / Implementation Advice

Vault Agent Injection Configuration

The Vault agent injector uses Kubernetes annotations to inject secrets. Here's the pattern for temporal-worker:

metadata:
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "temporal-worker"
    vault.hashicorp.com/agent-inject-secret-internal-api: "secret/data/alga-psa/temporal-worker"
    vault.hashicorp.com/agent-inject-template-internal-api: |
      {{- with secret "secret/data/alga-psa/temporal-worker" -}}
      export INTERNAL_API_SHARED_SECRET="{{ .Data.data.internal_api_shared_secret }}"
      {{- end }}
    vault.hashicorp.com/agent-inject-secret-auth-key: "secret/data/alga-psa/shared"
    vault.hashicorp.com/agent-inject-template-auth-key: |
      {{- with secret "secret/data/alga-psa/shared" -}}
      export ALGA_AUTH_KEY="{{ .Data.data.alga_auth_key }}"
      {{- end }}

Environment Variables Required

Based on the codebase analysis, the temporal-worker needs these environment variables:

Core Temporal Configuration:

  • TEMPORAL_ADDRESS: temporal-frontend.temporal.svc.cluster.local:7233
  • TEMPORAL_NAMESPACE: default
  • TEMPORAL_TASK_QUEUE: tenant-workflows

Database Configuration (matching alga-psa):

  • DB_HOST: From existing alga-psa configuration
  • DB_PORT: 5432
  • DB_NAME_SERVER: server
  • DB_USER_SERVER: app_user
  • DB_PASSWORD_SERVER: From existing secrets
  • DB_USER_ADMIN: postgres
  • DB_PASSWORD_ADMIN: From existing secrets

Application Configuration:

  • NODE_ENV: production
  • LOG_LEVEL: info
  • INTERNAL_API_SHARED_SECRET: From Vault
  • RESEND_API_KEY: From existing alga-psa secrets
  • APPLICATION_URL: Production URL for email links
  • NMSTORE_BASE_URL: For checkout session integration

Encryption Configuration:

  • ALGA_AUTH_KEY: From Vault (required for password hashing)
  • SALT_BYTES: 12 (or configured value)
  • ITERATIONS: 10000 (or configured value)
  • KEY_LENGTH: 64 (or configured value)
  • ALGORITHM: sha512 (or configured value)

Health Check Configuration:

  • ENABLE_HEALTH_CHECK: true
  • HEALTH_CHECK_PORT: 8080

Helm Template Structure

The temporal-worker should be deployed as a separate deployment within the alga-psa Helm chart. Key considerations:

  1. Namespace Strategy: Deploy in the msp namespace alongside alga-psa services
  2. Service Account: Use a dedicated service account for proper RBAC
  3. Resource Allocation: Start with conservative limits and adjust based on monitoring
  4. Scaling: Configure HPA with CPU and memory metrics
  5. Anti-affinity: Spread pods across nodes for high availability
  6. Image Pull Secrets: Use harbor-credentials for private registry access

Buildx Docker Build Pattern

Following the alga-psa build pattern, the temporal-worker build must:

  1. Use Docker-in-Docker sidecar: Run docker:27-dind as privileged sidecar
  2. Configure buildx builder: Create builder with docker-container driver
  3. Node-specific cache: Create PVC bound to the build node for cache persistence
  4. Multi-registry push: Push to both Harbor and GitHub Container Registry
  5. Platform specification: Build for linux/amd64 explicitly
  6. Cache configuration: Use local cache type with mode=max for optimal caching

Example buildx command pattern:

docker buildx build \
  --platform linux/amd64 \
  --push \
  --cache-from type=local,src=/buildx-cache \
  --cache-to type=local,dest=/buildx-cache,mode=max \
  --file ee/temporal-workflows/Dockerfile \
  -t harbor.nineminds.com/nineminds/temporal-worker:$SHA \
  .

Security Considerations

  1. Secret Rotation: Plan for API key rotation without downtime
  2. Network Policies: Restrict traffic to only required services
  3. RBAC: Minimal permissions for service account
  4. Image Scanning: Ensure images are scanned for vulnerabilities
  5. Pod Security: Run as non-root user with read-only filesystem

Potential Issues and Mitigations

  1. Database Connection Pool Exhaustion

    • Mitigation: Configure appropriate pool sizes and connection limits
    • Monitor active connections
  2. Temporal Worker Overwhelm

    • Mitigation: Configure appropriate concurrency limits
    • Use HPA for automatic scaling
  3. Secret Injection Failures

    • Mitigation: Add init containers to verify secrets
    • Implement graceful degradation
  4. Network Connectivity Issues

    • Mitigation: Implement retry logic with exponential backoff
    • Add circuit breakers for external services

Implementer's Scratch Pad

Pre-deployment Checklist

  • Vault access verified
  • Database connectivity tested (in msp namespace)
  • Temporal server reachable
  • Image built and pushed to Harbor
  • Secrets created in Vault (including ALGA_AUTH_KEY)
  • Service accounts configured
  • Harbor credentials configured for image pull

Deployment Notes

Date: 2025-07-27 Deployer: Claude Code

Implementation Progress:

  • Starting Phase 1: Infrastructure Preparation and Secret Setup
  • Creating Kubernetes manifests for temporal worker deployment
  • Completed Phase 2: Created all Helm templates for temporal worker
  • Completed Phase 3: Created Argo build workflow
  • Completed Phase 4: Created deployment and composite workflows
  • Created comprehensive deployment documentation

Completed Items:

  1. Helm Templates:

    • deployment.yaml with full Vault integration
    • service.yaml, configmap.yaml, hpa.yaml, pdb.yaml
    • serviceaccount.yaml and secrets.yaml
  2. Configuration:

    • Added temporal worker config to helm/values.yaml
    • Updated hosted.values.yaml with production settings
  3. Workflows:

    • temporal-worker-build-workflow.yaml with buildx caching
    • temporal-worker-deploy-workflow.yaml with health checks
    • Composite workflow with auto-detection of changes
  4. Documentation:

    • Comprehensive deployment guide in nm-kube-config
    • Covers building, deploying, troubleshooting, monitoring

Staging Deployment:

  • Start time:
  • End time:
  • Issues encountered:
  • Resolution:

Production Deployment:

  • Start time:
  • End time:
  • Issues encountered:
  • Resolution:

Performance Observations

  • Initial CPU usage:
  • Initial memory usage:
  • Peak CPU during load:
  • Peak memory during load:
  • Optimal replica count:
  • HPA threshold adjustments:

Post-deployment Tasks

  • Update documentation
  • Share deployment notes with team
  • Schedule post-mortem if issues occurred
  • Plan for next iteration improvements

Questions for Review

Rollback Record

Rollback Executed: Yes/No Reason: Steps Taken: Lessons Learned: