Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
3.6 KiB
3.6 KiB
Portal Custom Domain Runbook
Last updated: 2025-09-19
Purpose
Operational checklist for supporting enterprise tenants that register branded client-portal domains.
Prerequisites
- Access to the production Temporal namespace and task queue (
portal-domain-workflows). kubectlcontext configured for the hosting cluster.- Ability to query the
portal_domainstable (admin connection).
Registration Flow Overview
- Tenant submits a vanity domain from Settings → Client Portal.
- Server action validates the domain, stores it in
portal_domains, and enqueues the Temporal workflow. - Workflow stages:
verifying_dns: lookup confirms the CNAME points to<tenant7>.portal.algapsa.com.pending_certificate: reconciliation renders Kubernetes manifests, writes JSON snapshots (ifPORTAL_DOMAIN_MANIFEST_DIRis set), and applies resources.deploying: resources applied; waiting on cert-manager and Istio to finish.active: once certificate readiness + HTTP probe succeed (polling to be implemented).
- Any failure bubbles
status+status_messageback to the UI for tenant remediation.
Common Tasks
Check Current Status
select tenant, domain, status, status_message, last_checked_at
from portal_domains
where tenant = '<tenant-uuid>';
Force Reconciliation
# queue Temporal signal to rerun reconciliation
node scripts/trigger-portal-domain-workflow.mjs --tenant <tenant-uuid> --domain-id <portal-domain-id>
(Script placeholder – use Temporal CLI if script not yet available)
If PORTAL_DOMAIN_MANIFEST_DIR is configured, inspect the generated manifests prior to committing them to nm-kube-config:
ls $PORTAL_DOMAIN_MANIFEST_DIR/<tenant-prefix>
cat $PORTAL_DOMAIN_MANIFEST_DIR/<tenant-prefix>/gateway.json
Inspect Kubernetes Artifacts
kubectl get certificate portal-domain-<tenant7> -n msp
kubectl get gateway portal-domain-gw-<tenant7> -n istio-system
kubectl get virtualservice portal-domain-vs-<tenant7> -n msp
If HTTP-01 routing is enabled, confirm the challenge service resolves:
kubectl get service -n msp ${PORTAL_DOMAIN_CHALLENGE_HOST%%.*}
Clean Up After Disable
Workflow sets the row to disabled and re-enqueues reconciliation. If resources linger:
kubectl delete certificate <name> -n msp
kubectl delete secret <name> -n msp
kubectl delete virtualservice <name> -n msp
Alerts & Observability
- Review the JSON manifests at
$PORTAL_DOMAIN_MANIFEST_DIR, commit them to thenm-kube-configworktree, and open a PR before or immediately after applying changes. - Temporal workflow logs emit OpenTelemetry spans tagged with
tenantIdandportalDomainId. - PostHog events:
portal_domain.registration_enqueuedportal_domain.refreshportal_domain.disableReview dashboard “Portal Domains – Provisioning” for activation/ failure trends.
Troubleshooting
| Symptom | Likely Cause | Mitigation |
|---|---|---|
Status stuck dns_failed |
CNAME not pointing to canonical host | Ask tenant to update DNS or reduce TTL; use dig to confirm. |
Status certificate_failed |
Reconciliation unable to apply resources | Check Temporal worker logs; validate cert-manager/Ingress configuration. |
| Workflow not enqueued | Temporal unavailable or misconfigured task queue | Confirm Temporal connectivity; redeploy worker if necessary. |
Open Follow-ups
- Implement/standardise HTTP-01 challenge serving and readiness polling before marking domains
active. - Add alerting for certificates stuck in
Falsestate > 1 hour. - Ship the Temporal signal CLI/script referenced above.