PSA/ee/docs/plans/express-server-migration-plan.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

22 KiB

Express.js Custom Server Migration Plan

Introduction / Rationale

This document outlines the migration from Next.js built-in server to an Express.js custom server for the Alga PSA application. The primary motivations for this change are:

  1. Eliminate Edge Runtime limitations: The current Next.js middleware runs in Edge Runtime, making it difficult to access long-lived Node primitives (database pools, Redis clients) and mount traditional Express/Connect middleware.

  2. HTTP layer configurability: The stock Next.js server offers limited configurability for timeouts, keep-alive tuning, connection handling, and other HTTP-level optimizations needed in bare-metal/Kubernetes deployments.

  3. Development/Production parity: By using Express in both development and production, we eliminate potential runtime differences and deployment surprises.

The migration preserves all existing Next.js features (App Router, API routes, RSC, Server Actions, hot reloading) while moving to the full Node.js runtime.

Table of Contents

  1. Phase 1: Core Implementation
  2. Phase 2: Production Deployment
  3. Background Information
  4. Technical Details
  5. Scratch Pad / Notes

Note: Documentation below refers to adding documentary notes to THIS document in the plan area below. More substantive documentation is out of scope for this plan.

Phase 1: Core Implementation

Setup and Infrastructure

  • Create server/index.ts as the new Express server entry point
  • Install required Express dependencies (express, @types/express)
  • Configure TypeScript compilation for the new server file
  • Set up basic Express application with Next.js integration

Next.js Integration

  • Initialize Next.js compiler with dev mode detection (next({ dev: process.env.NODE_ENV !== 'production' }))
  • Implement Next.js request handler integration
  • Add proper error handling for Next.js compilation/preparation
  • Test Next.js hot reloading works with Express wrapper

Health Check Endpoints

  • Implement /healthz endpoint for Kubernetes liveness probes:
    • Follow existing health check response format from MetadataService.getApiHealth()
    • Return 200 for healthy, 503 for unhealthy, 500 for errors
    • Include basic system health (memory usage, uptime from process.uptime())
    • Bypass authentication (add to middleware bypass list like /api/health)
    • Keep liveness check simple - should rarely fail
  • Implement /readyz endpoint for Kubernetes readiness probes:
    • Follow same response format as healthz but with full dependency checks
    • Include database connectivity check using db.findOne('companies', { limit: 1 }) pattern (TODO: enhance)
    • Include Redis connectivity check if Redis is used by the service (TODO: enhance)
    • Include event bus status check following existing patterns (TODO: enhance)
    • Return comprehensive service status including dependencies
    • Can fail during startup or degraded states unlike liveness
  • Integrate with existing observability system: (TODO: enhance health checks)
    • Use existing observabilityMetrics.recordMemoryUsage() for memory metrics
    • Use existing observabilityLogger for health check logging
    • Follow existing error handling patterns from health check implementations
  • Test health endpoints return appropriate status codes and format

Middleware Migration - Analysis Phase

  • Analyze current server/src/middleware.ts structure and dependencies
  • Identify all route patterns and matching logic (/msp/:path*, /client-portal/:path*, /api/((?!auth/).)*)
  • Map NextAuth.js integration points and session handling:
    • Document getToken({ req }) usage in middleware
    • Identify NextAuth secret handling for token validation
    • Map session callback integration (JWT → Session conversion)
    • Document user type enforcement (client vs non-client access)
    • Map tenant extraction from JWT tokens
    • Document redirect logic for unauthenticated users
    • Identify SessionProvider wrapper requirements for client-side
  • Document API key validation flow and dependencies:
    • Map current HTTP round-trip validation pattern (middleware → /api/auth/validate-api-key)
    • Document ApiKeyService.validateApiKey() vs ApiKeyServiceForApi usage
    • Identify database schema and connection requirements (api_keys table)
    • Document SHA-256 hashing and security measures
    • Map tenant isolation and RLS policy requirements
    • Document header injection pattern (x-auth-user-id, x-auth-tenant)
    • Identify last_used_at timestamp update requirements
  • Document tenant isolation and header injection logic:
    • Map X-Cleanup-Connection header usage
    • Map x-tenant-id header injection
    • Document tenant context propagation through request pipeline
    • Identify authorization middleware integration points

Middleware Migration - Implementation Phase

  • Create Express middleware for API key validation (matching current /api/* logic):
    • Replace HTTP round-trip with direct ApiKeyServiceForApi.validateApiKeyAnyTenant() call
    • Implement Express req/res error handling (401 for missing/invalid keys, 500 for errors)
    • Add x-auth-user-id and x-auth-tenant header injection on successful validation
    • Update last_used_at timestamp in database (handled by ApiKeyServiceForApi)
    • Preserve route exclusion pattern for /api/auth/* routes
    • Handle tenant context propagation for downstream middleware
  • Convert NextAuth session handling to Express middleware format:
    • Replace getToken({ req }) with Express-compatible version
    • Handle NextAuth secret access in Express middleware context
    • Implement session validation and error handling
    • Convert JWT token to session object using existing callback logic
    • Handle token validation errors and expired sessions
    • Implement redirect logic for unauthenticated users (preserve callback URLs)
  • Implement user type enforcement middleware (client vs MSP):
    • Extract user type from session/token in Express middleware
    • Implement client portal route detection (/client-portal path checking)
    • Enforce access rules (client users → client portal only, non-client users → MSP only)
    • Handle access denied scenarios with appropriate redirects
    • Preserve error parameter passing (?error=AccessDenied)
  • Add tenant header injection middleware (X-Cleanup-Connection, x-tenant-id):
    • Extract tenant information from session/API key validation
    • Set headers on Express response object
    • Ensure headers are available for downstream processing
  • Integrate authorization middleware calls:
    • Convert authorizationMiddleware(req) to Express middleware format (placeholder ready)
    • Handle 403 responses and rewrite to /Denied page (ready for implementation)
    • Preserve tenant context for authorization checks
  • Preserve exact route matching patterns from current config.matcher:
    • Implement /msp/:path* pattern matching in Express
    • Implement /client-portal/:path* pattern matching in Express
    • Implement /api/((?!auth/).)* pattern matching in Express
    • Ensure proper middleware execution order based on route patterns

NextAuth.js Integration

  • Test NextAuth callbacks work correctly with Express req/res objects
  • Convert getToken({ req }) calls to work with Express middleware
  • Verify cookie parsing and session handling remains intact
  • Test authentication redirects work correctly
  • Validate callback URLs and authentication flows

Express Middleware Ordering

  • Mount health check endpoints first (/healthz, /readyz)
  • Add converted authentication/authorization middleware
  • Mount Next.js handler last (app.get('*', nextHandler))
  • Test middleware execution order matches expected behavior

Script and Configuration Updates

  • Update package.json dev script to use Express server
  • Update package.json start script to use Express server
  • Preserve enterprise edition environment variable support
  • Test both Community Edition and Enterprise Edition configurations

Development Workflow Testing

  • Test hot reloading works correctly in development mode
  • Verify fast refresh and HMR functionality is preserved
  • Test API routes work correctly through Express
  • Test React Server Components and Server Actions work correctly
  • Verify all existing development features work identically

Authentication Flow Testing

  • Test API key validation for protected /api/* routes
  • Test /api/auth/* routes are correctly excluded from API key validation
  • Test NextAuth signin/signout flows work correctly
  • Test session-based web route protection and redirects
  • Test tenant isolation and header injection
  • Test client portal vs MSP access control enforcement

Docker and Deployment Updates

  • Update server/Dockerfile to use node server/index.js (Already compatible - uses entrypoint.sh → npm start → index.js)
  • Verify enterprise edition Docker configuration works (Both Dockerfile and Dockerfile.dev use same entrypoint pattern)
  • Review docker-compose configurations for both CE and EE (All configurations use entrypoint.sh with npm start)
  • Investigate Helm charts and templates for Express server compatibility (All Helm deployments use entrypoint.sh, no changes needed)
  • Investigate Argo Workflow deployment pipeline compatibility (All build processes compatible, no changes needed)
  • Test Docker build process works correctly (TODO: Phase 2)
  • Test docker-compose configurations for both CE and EE (TODO: Phase 2)

Phase 2: Production Deployment

Pre-deployment Validation

  • Run comprehensive test suite in Express mode
  • Performance test Express server vs Next.js built-in server
  • Load test authentication flows and middleware performance
  • Validate memory usage and resource consumption
  • Test enterprise edition features work correctly

Deployment Execution

  • Deploy to production using Express server
  • Monitor application startup and health checks
  • Verify all authentication flows work in production
  • Monitor performance metrics and error rates
  • Validate tenant isolation works correctly in production

Post-deployment Monitoring

  • Monitor application logs for any Express-specific issues
  • Track response times and performance metrics
  • Monitor memory usage and garbage collection
  • Verify all existing functionality works correctly
  • Document any issues or performance improvements observed

Background Information

Current Architecture

  • Next.js Version: 14.0.0
  • Runtime: Edge Runtime for middleware, Node.js for API routes
  • Authentication: NextAuth.js with session-based auth for web, API keys for API routes
  • Database: PostgreSQL with connection pooling
  • Caching: Redis for session storage and caching
  • Deployment: Docker containers in Kubernetes

Current Middleware Logic (server/src/middleware.ts)

DETAILED ANALYSIS COMPLETED:

Route Matching Configuration:

matcher: [
  '/msp/:path*',              // MSP dashboard routes
  '/client-portal/:path*',    // Client portal routes  
  '/api/((?!auth/).)*'        // API routes except /api/auth/*
]

API Routes Handler (handleApiRequest function):

  1. Route Exclusions: /api/health bypassed (no auth required)
  2. API Key Validation:
    • Requires x-api-key header, returns 401 if missing
    • HTTP Round-trip: Makes fetch() call to /api/auth/validate-api-key
    • Performance Impact: Each API request requires internal HTTP request
  3. Header Injection: On success, adds x-auth-user-id and x-auth-tenant
  4. Request Cloning: Creates new NextRequest with modified headers (complex operation)
  5. Error Handling: Returns 500 for validation errors

Web Routes Handler (main middleware function):

  1. Route Exclusions: /auth/* and /client-portal/auth/* bypassed
  2. Session Validation: Uses getToken({ req }) from NextAuth
  3. Redirect Logic: Unauthenticated users → /auth/signin?callbackUrl=<current_path>
  4. User Type Enforcement:
    • client users: Only /client-portal/* access allowed
    • Non-client users: Only non-/client-portal/* access allowed
    • Access denied → /auth/signin?error=AccessDenied&callbackUrl=<path>
  5. Authorization Check: Calls authorizationMiddleware(req), 403 → /Denied rewrite
  6. Tenant Headers: Sets X-Cleanup-Connection and x-tenant-id on response

Authorization Middleware (middleware/authorizationMiddleware.ts):

  1. Token Validation: Uses getToken() with explicit NEXTAUTH_SECRET
  2. Error Handling: TokenValidationError → redirect to signin
  3. Tenant Context: Sets x-tenant-id header from token
  4. Fallback: Missing tenant → redirect to signin

API Key Validation Services (Analysis Complete):

  1. ApiKeyService (lib/services/apiKeyService.ts):
    • Uses createTenantKnex() - requires tenant context
    • For general use within application logic
  2. ApiKeyServiceForApi (lib/services/apiKeyServiceForApi.ts):
    • Designed for middleware use - avoids circular dependencies
    • validateApiKeyForTenant(): Validates with known tenant
    • validateApiKeyAnyTenant(): Best for Express middleware - searches all tenants
    • Direct database access via getConnection(null)
    • Updates last_used_at timestamp automatically
    • SHA-256 hashed key storage with expiration date support

Migration Performance Improvement:

  • Current: HTTP round-trip (fetch('/api/auth/validate-api-key')) for each API request
  • Target: Direct database call with ApiKeyServiceForApi.validateApiKeyAnyTenant()
  • Benefit: Eliminate HTTP overhead, reduce latency

Dependencies to Preserve

  • Hot Reloading: Must work identically in development
  • NextAuth.js: All authentication flows must work unchanged
  • API Routes: All existing routes must work without modification
  • React Server Components: Must continue working without changes
  • Server Actions: Must continue working without changes
  • Tenant Isolation: All multi-tenant logic must be preserved
  • Enterprise Edition: Module aliasing and EE features must work

Technical Details

Express Server Structure

// server/index.ts basic structure
import express from 'express';
import next from 'next';

const dev = process.env.NODE_ENV !== 'production';
const app = next({ dev });
const handle = app.getRequestHandler();

async function main() {
  await app.prepare();
  
  const server = express();
  
  // Health checks
  server.get('/healthz', healthCheck);
  server.get('/readyz', readinessCheck);
  
  // Auth middleware
  server.use(convertedAuthMiddleware);
  
  // Next.js handler
  server.all('*', (req, res) => handle(req, res));
  
  server.listen(port, () => {
    console.log(`Server running on port ${port}`);
  });
}

Middleware Conversion Pattern

// Convert from Next.js middleware format:
export async function middleware(req: NextRequest) {
  // Logic here
  return NextResponse.next();
}

// To Express middleware format:
export function expressMiddleware(req: express.Request, res: express.Response, next: express.NextFunction) {
  // Converted logic here
  next();
}

NextAuth Integration Considerations

  • NextAuth expects specific req/res object formats
  • Cookie parsing must work correctly with Express
  • Session token extraction must work with Express req objects
  • Authentication callbacks must route through Next.js handler

Scratch Pad / Notes

Development Notes

  • Research: How does NextAuth.js getToken() work with Express req objects?
  • Research: Are there any breaking changes in Next.js custom server setup for v14?
  • Note: Need to ensure Express middleware runs before Next.js handler
  • Note: Health checks should be simple and not depend on complex application state
  • DISCOVERED: Next.js 14 custom server has issue with "missing bootstrap script" when using server.use() for Next.js handler
    • Issue: Next.js returns 500 error with "Invariant: missing bootstrap script. This is a bug in Next.js"
    • RESOLVED: Using middleware approach with proper routing works correctly
    • Solution: Health endpoints with explicit GET routes, Next.js handler via middleware with path checking
  • SUCCESS: Basic Express + Next.js integration working correctly
    • Health endpoints (/healthz, /readyz) responding with JSON
    • Next.js API routes working through Express (/api/health)
    • Next.js web routes working (proper redirects)
  • IMPLEMENTED: Express middleware architecture (src/middleware/express/authMiddleware.ts)
    • apiKeyAuthMiddleware: Direct database validation via ApiKeyServiceForApi.validateApiKeyAnyTenant()
    • sessionAuthMiddleware: NextAuth integration for web routes with user type enforcement
    • tenantHeaderMiddleware: Tenant header injection (X-Cleanup-Connection, x-tenant-id)
    • authorizationMiddleware: Placeholder for authorization checks
    • Performance: Eliminates HTTP round-trip for API key validation
    • Integration: Applied to Express server in proper middleware order
  • 🎉 PHASE 1 COMPLETE: Express server successfully replaces Next.js built-in server
    • All core functionality working: health endpoints, API auth, web auth, redirects
    • Scripts updated for development (tsx index.ts) and production (node index.js)
    • Performance improved with direct database calls vs HTTP round-trips
    • Development/production parity achieved with Express in both environments
  • DOCKER COMPATIBILITY CONFIRMED: All Docker configurations already compatible
    • Both Dockerfile and Dockerfile.dev use entrypoint.shnpm startnode index.js
    • All docker-compose configurations use same entrypoint pattern
    • No changes needed for Docker deployment - already works with Express server
  • HELM COMPATIBILITY CONFIRMED: All Kubernetes deployment configurations compatible
    • Helm charts use command: ["./entrypoint.sh"] which calls our Express server
    • Health check endpoints (/healthz, /readyz) implemented in Express server match K8s patterns
    • Both development and production Helm values use same deployment approach
    • Container port 3000 correctly configured for Express server
  • ARGO WORKFLOW COMPATIBILITY CONFIRMED: All CI/CD pipeline processes compatible
    • Build workflows use npm run build:ce and npm run build:ee which build Next.js artifacts
    • Docker builds use existing Dockerfile/entrypoint.sh approach compatible with Express
    • Deployment workflows use Helm charts (already confirmed compatible)
    • Pipeline produces same build artifacts (.next directory) that Express server serves
    • No changes needed to build or deployment processes

Testing Checklist

  • Development hot reload functionality
  • Production build and startup
  • Express Middleware Testing (COMPLETED):
    • Health endpoints continue working (/healthz, /readyz)
    • API routes without auth work (/api/health)
    • API routes with API key validation work:
      • Missing API key returns 401
      • Invalid API key returns 401
    • Web route authentication redirects work:
      • Protected routes redirect to /auth/signin?callbackUrl=<path>
      • Auth routes bypass authentication
    • User type enforcement works (client vs MSP access) (implemented in middleware)
    • Tenant headers are properly injected (implemented in tenantHeaderMiddleware)
    • NextAuth session handling works correctly (getToken integration working)
    • Authorization middleware integration works (placeholder implemented, ready for logic)
  • Authentication flows (both API key and session-based)
  • Tenant isolation and multi-tenancy (headers and context preserved)
  • Enterprise edition features (NEXT_PUBLIC_EDITION support maintained)
  • Docker container builds and deployment
  • Kubernetes health checks and probes

Potential Issues and Mitigations

  • Issue: NextAuth callbacks not working with Express
    • Mitigation: Ensure callbacks route through Next.js handler, not raw Express
  • Issue: Session token extraction failing
    • Mitigation: Test getToken() with Express req objects, may need adapter
  • Issue: Hot reload not working
    • Mitigation: Ensure Next.js dev mode is properly configured
  • Issue: Performance regression
    • Mitigation: Minimize middleware overhead, benchmark before/after

Questions to Resolve

  • Do we need any Express-specific configuration for body parsing?
  • Should we add compression middleware or let Next.js handle it?
  • Are there any specific Express security headers we should add?
  • Do we need to handle Express error middleware differently?

Success Criteria

  • All existing functionality works identically
  • Development experience is unchanged (hot reload, fast refresh)
  • Production performance is equal or better
  • All authentication flows work correctly
  • Multi-tenant isolation is preserved
  • Enterprise edition features work correctly
  • Docker builds and Kubernetes deployments work correctly