dev_arc_aws/.kiro/specs/code-audit-fixes/requirements.md

165 lines
11 KiB
Markdown
Raw Normal View History

2026-06-24 19:20:18 +00:00
# Requirements Document
## Introduction
This feature addresses 13 Critical and High severity issues identified during a code audit of the ArchNest self-hosted ops dashboard. The fixes target security vulnerabilities (injection, traversal, timing attacks, token exposure), resource leaks (WebSocket, SSH connections), and stability gaps (missing error boundaries, unhandled exceptions). A CloudFormation template for test deployment is also included to verify fixes in an isolated environment.
## Glossary
- **Backend**: The Fastify 5 + TypeScript server application in `backend/src/`
- **Frontend**: The React 19 + Vite 8 + TypeScript client application in `src/`
- **WebSocket_Session**: A browser-to-server WebSocket connection used for terminal, Docker exec, or tmux-list operations
- **Terminal_Route**: The backend WebSocket endpoint at `/api/terminal` handling SSH terminal sessions
- **Docker_SSH_Module**: The `backend/src/ssh/docker.ts` module that runs Docker CLI commands over SSH
- **Files_Route**: The backend REST endpoint group at `/api/files/:integrationId/*` for SFTP file operations
- **Agents_Route**: The backend endpoint at `/api/agents/docker/report` for agent token-gated ingest
- **Data_Route**: The backend endpoint group at `/api/data/import` and `/api/data/export` for backup/restore
- **Docker_Exec_Route**: The backend WebSocket endpoint at `/api/docker/exec` for Docker container exec sessions
- **Error_Boundary**: A React class component that catches JavaScript errors in its child component tree and renders a fallback UI
- **CloudFormation_Template**: An AWS CloudFormation YAML file that provisions test infrastructure (EC2, security group, budget alarm)
- **Container_Ref**: A Docker container name or ID string validated before interpolation into shell commands
- **Integration_ID**: A numeric identifier for an SSH/Docker integration stored in the SQLite database
## Requirements
### Requirement 1: WebSocket Session Leak Prevention
**User Story:** As an operator, I want terminal reconnections to properly close prior WebSocket connections, so that dangling sessions do not accumulate and exhaust server resources.
#### Acceptance Criteria
1. WHEN a new terminal WebSocket connection is initiated for a pane, THE Frontend SHALL close the existing WebSocket (if any) before creating a new WebSocket instance.
2. WHEN the existing WebSocket readyState is OPEN or CONNECTING, THE Frontend SHALL call `ws.close()` on the existing reference prior to reassignment.
3. IF the WebSocket `onclose` event fires after a new connection has replaced the reference, THEN THE Frontend SHALL not modify state belonging to the new connection.
### Requirement 2: tmux Session Name Injection Prevention
**User Story:** As a security engineer, I want tmux session names to be strictly validated, so that shell metacharacter injection through the terminal WebSocket is impossible.
#### Acceptance Criteria
1. THE Terminal_Route SHALL validate tmux session names against the pattern `^[A-Za-z0-9_-]{1,64}$`.
2. WHEN a `connect` message includes a `tmuxSession` value that does not match the allowed pattern, THE Terminal_Route SHALL treat it as null and open a plain shell instead.
3. WHEN a valid tmux session name is used in a shell command, THE Terminal_Route SHALL pass it only within a validated context where no additional characters can be appended by the client.
### Requirement 3: Docker Container Reference Validation
**User Story:** As a security engineer, I want container name/ID references to be tightly validated, so that shell command injection through crafted container identifiers is prevented.
#### Acceptance Criteria
1. THE Docker_SSH_Module SHALL validate container references against the pattern `^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$`.
2. WHEN a container reference fails validation, THE Docker_SSH_Module SHALL throw an error before any shell command is constructed.
3. THE Docker_SSH_Module SHALL pass validated container references through single-quote shell escaping before interpolation into commands.
### Requirement 4: Sidebar Promise Error Handling
**User Story:** As a user, I want the sidebar to handle API failures gracefully, so that an unhandled promise rejection does not crash the application or produce console errors.
#### Acceptance Criteria
1. WHEN the `api.listIntegrations()` call in the Sidebar component rejects, THE Frontend SHALL catch the error and leave the integrations state as null.
2. IF the integrations API call fails, THEN THE Frontend SHALL display the "Checking…" status label rather than an error state.
### Requirement 5: WebSocket Authentication via First Message
**User Story:** As a security engineer, I want JWT tokens removed from WebSocket URL query strings, so that tokens are not logged in server access logs or proxy logs.
#### Acceptance Criteria
1. THE Frontend SHALL not include the JWT token as a URL query parameter when opening terminal or Docker exec WebSocket connections.
2. WHEN a terminal WebSocket connection opens, THE Frontend SHALL send an authentication message containing the JWT token as the first message before any other message type.
3. WHEN the Backend receives a WebSocket connection on the terminal or Docker exec endpoints, THE Backend SHALL require a valid JWT token in the first message before processing `connect` or `list_tmux` messages.
4. IF the first message does not contain a valid JWT token, THEN THE Backend SHALL send an error frame and close the WebSocket connection.
### Requirement 6: File Path Traversal Prevention
**User Story:** As a security engineer, I want file operation paths to be validated against directory traversal, so that attackers cannot access files outside the intended directory scope.
#### Acceptance Criteria
1. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject paths containing `../` sequences after normalization.
2. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject absolute paths (paths starting with `/`).
3. IF a path fails traversal validation, THEN THE Files_Route SHALL return HTTP 400 with a descriptive error message.
4. THE Files_Route SHALL apply path validation to all endpoints that accept a user-supplied path: list, content read, content write, mkdir, rename, delete, chmod, download, and upload.
### Requirement 7: Agent Token Timing-Safe Comparison
**User Story:** As a security engineer, I want agent token comparison to use constant-time equality, so that timing side-channel attacks cannot be used to guess the token byte-by-byte.
#### Acceptance Criteria
1. THE Agents_Route SHALL compare presented tokens using `crypto.timingSafeEqual()`.
2. WHEN the presented token length differs from the expected token length, THE Agents_Route SHALL pad both buffers to equal length before performing the constant-time comparison.
3. THE Agents_Route SHALL return the same HTTP response (401 Unauthorized) regardless of whether the token length or content mismatched.
### Requirement 8: Data Import Size Limit
**User Story:** As an operator, I want data import requests to have a body size limit, so that an admin cannot accidentally or maliciously cause a denial-of-service by importing an extremely large JSON payload.
#### Acceptance Criteria
1. THE Data_Route import endpoint SHALL enforce a maximum request body size of 10 MB.
2. IF a request body exceeds 10 MB, THEN THE Data_Route SHALL reject the request with HTTP 413 before parsing the JSON payload.
### Requirement 9: SSH Connection Leak Prevention
**User Story:** As an operator, I want SSH connections used for Docker-over-SSH operations to be closed reliably, so that connection leaks do not exhaust SSH connection limits on remote hosts.
#### Acceptance Criteria
1. THE Docker_SSH_Module `withSshClient` function SHALL close both the primary SSH client and any jump-host client in a `finally` block after the operation completes.
2. IF the operation function throws an error, THEN THE Docker_SSH_Module SHALL still close both SSH connections before returning the error result.
### Requirement 10: CORS Origin Fail-Closed Default
**User Story:** As a security engineer, I want CORS to reject all cross-origin requests when no explicit origin is configured, so that a misconfigured deployment does not silently allow any origin.
#### Acceptance Criteria
1. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is not set, THE Backend SHALL default CORS origin to `false` (reject all cross-origin requests).
2. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is set to a valid origin string, THE Backend SHALL use that value as the allowed CORS origin.
3. THE Backend SHALL log a warning at startup when `ARCHNEST_CORS_ORIGIN` is not configured, indicating that cross-origin requests are blocked.
### Requirement 11: React Error Boundary
**User Story:** As a user, I want the application to catch rendering errors gracefully, so that a single component crash does not take down the entire dashboard.
#### Acceptance Criteria
1. THE Frontend SHALL wrap the Dashboard component tree in an Error_Boundary component.
2. WHEN a child component throws during rendering, THE Error_Boundary SHALL catch the error and render a fallback UI instead of a blank screen.
3. THE Error_Boundary fallback UI SHALL display a message indicating an error occurred and provide a way to reload the page.
4. THE Error_Boundary SHALL log the caught error to the browser console for debugging.
### Requirement 12: WebSocket JSON Parse Error Handling
**User Story:** As an operator, I want malformed WebSocket messages to be handled gracefully, so that invalid JSON from a client does not crash the connection handler.
#### Acceptance Criteria
1. WHEN the Docker_Exec_Route receives a WebSocket message that is not valid JSON, THE Docker_Exec_Route SHALL send an error frame with message "Invalid JSON" to the client.
2. WHEN the Docker_Exec_Route receives invalid JSON, THE Docker_Exec_Route SHALL not close the WebSocket connection (allowing the client to retry).
### Requirement 13: Session Log Path Traversal Prevention
**User Story:** As a security engineer, I want the integrationId used in session log file paths to be strictly validated as numeric, so that path traversal through crafted identifiers is impossible.
#### Acceptance Criteria
1. WHEN session logging is enabled, THE Terminal_Route SHALL validate that the integrationId is a positive integer before constructing the log file path.
2. IF the integrationId is not a valid positive integer, THEN THE Terminal_Route SHALL skip session logging for that connection rather than writing to an unvalidated path.
### Requirement 14: CloudFormation Test Deploy Template
**User Story:** As a developer, I want a CloudFormation template that provisions a t4g.small EC2 instance with Docker in us-east-1, so that I can verify all fixes in an isolated environment within a $30/month budget.
#### Acceptance Criteria
1. THE CloudFormation_Template SHALL provision a t4g.small EC2 instance in us-east-1 running Amazon Linux 2023.
2. THE CloudFormation_Template SHALL create a security group allowing inbound SSH (port 22) and HTTP/HTTPS (ports 80, 443) from any source.
3. THE CloudFormation_Template SHALL install Docker and Docker Compose on the EC2 instance via UserData.
4. THE CloudFormation_Template SHALL create an AWS Budget alarm at $30/month threshold with email notification.
5. THE CloudFormation_Template SHALL output the instance public IP and instance ID for SSH access.
6. THE CloudFormation_Template SHALL accept parameters for the SSH key pair name and notification email address.