Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
14 KiB
Temporal Worker Stack Production Deployment Plan
Intro / Rationale
This plan outlines the deployment of the temporal worker service to the alga-psa hosted production environment. The temporal worker is a critical component that handles asynchronous workflows including tenant provisioning, user management, email notifications, and checkout session processing.
The deployment requires:
- Integration with existing alga-psa infrastructure (database, secrets, vault)
- Proper scaling configuration for production workloads
- Secure API key management through Vault agent injection
- Alignment with existing deployment patterns using Argo workflows
Success Criteria:
- Temporal worker running as a scalable service in production
- All required secrets properly injected from Vault
- Health checks and monitoring in place
- Successful integration with existing alga-psa services
- Zero downtime deployment capability
Key Stakeholders:
- DevOps team (deployment and infrastructure)
- Backend team (temporal workflow functionality)
- Security team (secrets management)
Phased Implementation Checklist
Phase 1: Infrastructure Preparation and Secret Setup
- Create Vault secret for INTERNAL_API_SHARED_SECRET
- Generate secure random API key (minimum 32 characters)
- Store in Vault at path:
secret/data/alga-psa/temporal-worker - Include key:
internal_api_shared_secret
- Ensure ALGA_AUTH_KEY exists in shared secrets
- Verify key exists at:
secret/data/alga-psa/shared - Include key:
alga_auth_key
- Verify key exists at:
- Create Vault policy for temporal-worker service account
- Grant read access to
secret/data/alga-psa/temporal-worker/* - Grant read access to existing alga-psa secrets path
- Grant read access to
- Configure Kubernetes service account for temporal-worker
- Create service account in target namespace (likely
mspor dedicatedtemporalnamespace) - Bind Vault policy to service account
- Create service account in target namespace (likely
- Verify Vault agent injector is installed and configured in cluster
- Check vault-agent-injector deployment status
- Verify webhook configuration
Phase 2: Helm Chart Development
- Create temporal-worker Helm templates
- Copy and adapt deployment template from main alga-psa deployment
- Create
helm/templates/temporal-worker/deployment.yaml - Create
helm/templates/temporal-worker/service.yaml - Create
helm/templates/temporal-worker/configmap.yaml - Create
helm/templates/temporal-worker/hpa.yamlfor autoscaling - Create
helm/templates/temporal-worker/pdb.yamlfor pod disruption budget - Create
helm/templates/temporal-worker/serviceaccount.yaml - Create
helm/templates/temporal-worker/secrets.yamlfor local development
- Add temporal-worker configuration to values files
- Update
helm/values.yamlwith default temporal worker settings - Update
nm-kube-config/alga-psa/hosted.values.yamlwith production overrides
- Update
- Configure Vault agent injection annotations
- Add vault.hashicorp.com/agent-inject annotations
- Configure secret paths for all required secrets
- Set up secret templates for environment variable format
- Add temporal worker image configuration
- Configure image repository path
- Set up image pull secrets if using private registry
Phase 3: Build and Registry Setup
- Create temporal-worker build workflow in Argo
- Create
workflows/build/temporal-worker-build-workflow.yaml - Follow buildx pattern from
alga-psa-ci-cd-workflow.yaml - Configure docker-in-docker sidecar with buildx
- Set up buildx builder with persistent cache
- Configure multi-platform builds (linux/amd64)
- Create
- Configure Harbor registry authentication
- Use harbor-credentials secret
- Set up registry authentication in buildx
- Configure push to harbor.nineminds.com/nineminds/temporal-worker
- Implement buildx cache strategy
- Create node-specific buildx cache PVC
- Configure cache-from and cache-to options
- Use local cache type for persistence
- Build and push initial temporal-worker image
- Tag with git commit SHA
- Push to Harbor registry
- Verify image accessibility from cluster
Phase 4: Argo Workflow Integration
- Create temporal-worker deployment workflow
- Create
workflows/deploy/temporal-worker-deploy-workflow.yaml - Include steps for:
- Cloning repositories
- Updating image tags in values
- Running Helm deployment
- Health check verification
- Create
- Update composite workflows
- Create
alga-psa-build-migrate-deploy-with-temporal.yamlto include temporal-worker - Add conditional deployment based on changes to ee/temporal-workflows
- Create
- Create rollback workflow
- Implement automated rollback on deployment failure
- Include health check validation
- Test workflows in staging environment
Phase 5: Database and Network Configuration
- Verify database connectivity in msp namespace
- All services in msp namespace have network access by default
- Confirm PostgreSQL cluster endpoint from alga-psa config
- Verify connection strings match alga-psa pattern
- Configure Temporal server connectivity
- Verify temporal-frontend service is accessible
- Ensure correct namespace and ports
- Test connection from within msp namespace
- Set up Redis access (if needed for caching)
- Verify Redis service accessibility from msp namespace
- Configure connection parameters
Phase 6: Deployment and Validation
- Deploy to staging environment first
- Run deployment workflow with staging parameters
- Verify all pods start successfully
- Check secret injection logs
- Validate health endpoints
- Run integration tests
- Test tenant provisioning workflow
- Test email sending functionality
- Verify checkout session handling
- Check activity timeout handling
- Monitor resource usage
- Observe CPU and memory consumption
- Adjust resource requests/limits
- Configure HPA thresholds
- Deploy to production
- Execute production deployment workflow
- Monitor deployment progress
- Verify zero downtime
- Check all health endpoints
Phase 7: Monitoring and Observability
- Configure logging
- Ensure logs are collected by cluster logging solution
- Set appropriate log levels for production
- Configure structured logging format
- Set up metrics collection
- Export Temporal worker metrics
- Configure Prometheus scraping
- Create Grafana dashboards
- Configure alerting
- Set up alerts for worker health
- Configure alerts for workflow failures
- Set up PagerDuty integration
- Document runbooks
- Create troubleshooting guide
- Document common issues and resolutions
- Include rollback procedures
Background Details / Investigation / Implementation Advice
Vault Agent Injection Configuration
The Vault agent injector uses Kubernetes annotations to inject secrets. Here's the pattern for temporal-worker:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "temporal-worker"
vault.hashicorp.com/agent-inject-secret-internal-api: "secret/data/alga-psa/temporal-worker"
vault.hashicorp.com/agent-inject-template-internal-api: |
{{- with secret "secret/data/alga-psa/temporal-worker" -}}
export INTERNAL_API_SHARED_SECRET="{{ .Data.data.internal_api_shared_secret }}"
{{- end }}
vault.hashicorp.com/agent-inject-secret-auth-key: "secret/data/alga-psa/shared"
vault.hashicorp.com/agent-inject-template-auth-key: |
{{- with secret "secret/data/alga-psa/shared" -}}
export ALGA_AUTH_KEY="{{ .Data.data.alga_auth_key }}"
{{- end }}
Environment Variables Required
Based on the codebase analysis, the temporal-worker needs these environment variables:
Core Temporal Configuration:
TEMPORAL_ADDRESS: temporal-frontend.temporal.svc.cluster.local:7233TEMPORAL_NAMESPACE: defaultTEMPORAL_TASK_QUEUE: tenant-workflows
Database Configuration (matching alga-psa):
DB_HOST: From existing alga-psa configurationDB_PORT: 5432DB_NAME_SERVER: serverDB_USER_SERVER: app_userDB_PASSWORD_SERVER: From existing secretsDB_USER_ADMIN: postgresDB_PASSWORD_ADMIN: From existing secrets
Application Configuration:
NODE_ENV: productionLOG_LEVEL: infoINTERNAL_API_SHARED_SECRET: From VaultRESEND_API_KEY: From existing alga-psa secretsAPPLICATION_URL: Production URL for email linksNMSTORE_BASE_URL: For checkout session integration
Encryption Configuration:
ALGA_AUTH_KEY: From Vault (required for password hashing)SALT_BYTES: 12 (or configured value)ITERATIONS: 10000 (or configured value)KEY_LENGTH: 64 (or configured value)ALGORITHM: sha512 (or configured value)
Health Check Configuration:
ENABLE_HEALTH_CHECK: trueHEALTH_CHECK_PORT: 8080
Helm Template Structure
The temporal-worker should be deployed as a separate deployment within the alga-psa Helm chart. Key considerations:
- Namespace Strategy: Deploy in the msp namespace alongside alga-psa services
- Service Account: Use a dedicated service account for proper RBAC
- Resource Allocation: Start with conservative limits and adjust based on monitoring
- Scaling: Configure HPA with CPU and memory metrics
- Anti-affinity: Spread pods across nodes for high availability
- Image Pull Secrets: Use harbor-credentials for private registry access
Buildx Docker Build Pattern
Following the alga-psa build pattern, the temporal-worker build must:
- Use Docker-in-Docker sidecar: Run docker:27-dind as privileged sidecar
- Configure buildx builder: Create builder with docker-container driver
- Node-specific cache: Create PVC bound to the build node for cache persistence
- Multi-registry push: Push to both Harbor and GitHub Container Registry
- Platform specification: Build for linux/amd64 explicitly
- Cache configuration: Use local cache type with mode=max for optimal caching
Example buildx command pattern:
docker buildx build \
--platform linux/amd64 \
--push \
--cache-from type=local,src=/buildx-cache \
--cache-to type=local,dest=/buildx-cache,mode=max \
--file ee/temporal-workflows/Dockerfile \
-t harbor.nineminds.com/nineminds/temporal-worker:$SHA \
.
Security Considerations
- Secret Rotation: Plan for API key rotation without downtime
- Network Policies: Restrict traffic to only required services
- RBAC: Minimal permissions for service account
- Image Scanning: Ensure images are scanned for vulnerabilities
- Pod Security: Run as non-root user with read-only filesystem
Potential Issues and Mitigations
-
Database Connection Pool Exhaustion
- Mitigation: Configure appropriate pool sizes and connection limits
- Monitor active connections
-
Temporal Worker Overwhelm
- Mitigation: Configure appropriate concurrency limits
- Use HPA for automatic scaling
-
Secret Injection Failures
- Mitigation: Add init containers to verify secrets
- Implement graceful degradation
-
Network Connectivity Issues
- Mitigation: Implement retry logic with exponential backoff
- Add circuit breakers for external services
Implementer's Scratch Pad
Pre-deployment Checklist
- Vault access verified
- Database connectivity tested (in msp namespace)
- Temporal server reachable
- Image built and pushed to Harbor
- Secrets created in Vault (including ALGA_AUTH_KEY)
- Service accounts configured
- Harbor credentials configured for image pull
Deployment Notes
Date: 2025-07-27 Deployer: Claude Code
Implementation Progress:
- Starting Phase 1: Infrastructure Preparation and Secret Setup
- Creating Kubernetes manifests for temporal worker deployment
- Completed Phase 2: Created all Helm templates for temporal worker
- Completed Phase 3: Created Argo build workflow
- Completed Phase 4: Created deployment and composite workflows
- Created comprehensive deployment documentation
Completed Items:
-
Helm Templates:
- deployment.yaml with full Vault integration
- service.yaml, configmap.yaml, hpa.yaml, pdb.yaml
- serviceaccount.yaml and secrets.yaml
-
Configuration:
- Added temporal worker config to helm/values.yaml
- Updated hosted.values.yaml with production settings
-
Workflows:
- temporal-worker-build-workflow.yaml with buildx caching
- temporal-worker-deploy-workflow.yaml with health checks
- Composite workflow with auto-detection of changes
-
Documentation:
- Comprehensive deployment guide in nm-kube-config
- Covers building, deploying, troubleshooting, monitoring
Staging Deployment:
- Start time:
- End time:
- Issues encountered:
- Resolution:
Production Deployment:
- Start time:
- End time:
- Issues encountered:
- Resolution:
Performance Observations
- Initial CPU usage:
- Initial memory usage:
- Peak CPU during load:
- Peak memory during load:
- Optimal replica count:
- HPA threshold adjustments:
Post-deployment Tasks
- Update documentation
- Share deployment notes with team
- Schedule post-mortem if issues occurred
- Plan for next iteration improvements
Questions for Review
Rollback Record
Rollback Executed: Yes/No Reason: Steps Taken: Lessons Learned: