Add code-audit-fixes spec (#5)
All checks were successful
CI / validate (push) Successful in 3m33s

This commit is contained in:
Samuel James 2026-06-24 19:20:18 +00:00
parent 4ab0b2fff6
commit 3172104d29
5 changed files with 1055 additions and 0 deletions

View file

@ -0,0 +1 @@
{"specId": "044511cc-0b54-456f-9bbb-5769c4f7c380", "workflowType": "fast-task", "specType": "feature"}

View file

@ -0,0 +1,561 @@
# Design Document: Code Audit Fixes
## Overview
This design addresses 13 Critical + High severity issues from the ArchNest code audit plus a CloudFormation test deploy template. The fixes are surgical — each targets a specific file with a minimal, correct patch. No new dependencies are introduced; all fixes use Node.js built-ins, existing Fastify hooks, and standard React patterns already in the codebase.
## Architecture
The fixes span three layers:
```
┌─────────────────────────────────────────────────────────┐
│ Frontend (React 19 + Vite 8 + TypeScript) │
│ ├─ TerminalSessionContext.tsx — WS lifecycle, auth │
│ ├─ Sidebar.tsx — promise error catch │
│ ├─ App.tsx — ErrorBoundary wrapper │
│ └─ ErrorBoundary.tsx — new component │
├─────────────────────────────────────────────────────────┤
│ Backend (Fastify 5 + TypeScript) │
│ ├─ routes/terminal.ts — tmux validation, WS auth, │
│ │ session log ID validation │
│ ├─ routes/files.ts — path traversal prevention │
│ ├─ routes/agents.ts — timing-safe token compare │
│ ├─ routes/data.ts — body size limit │
│ ├─ routes/docker.ts — JSON parse error handling │
│ ├─ ssh/docker.ts — container ref validation, │
│ │ SSH connection cleanup │
│ └─ server.ts — CORS fail-closed default │
├─────────────────────────────────────────────────────────┤
│ Infrastructure │
│ └─ infra/test-deploy.yml — CloudFormation template │
└─────────────────────────────────────────────────────────┘
```
## Components and Interfaces
### Component 1: WebSocket Session Leak Prevention (Frontend)
**File:** `src/lib/TerminalSessionContext.tsx`
**Current problem:** The `connect()` function calls `s.ws?.close()` but a race condition exists where the old WS `onclose` handler fires after reassignment and corrupts state belonging to the new connection.
**Fix:**
```typescript
function connect(s: PaneSession, id: number, tmuxSession?: string) {
s.disposeListeners?.()
// Close existing WS if OPEN or CONNECTING — guard against leak
if (s.ws && (s.ws.readyState === WebSocket.OPEN || s.ws.readyState === WebSocket.CONNECTING)) {
s.ws.close()
}
s.ws = null
s.connected = false
bump()
const term = s.term
term.reset()
term.writeln('Connecting…')
const token = getToken()
const proto = window.location.protocol === 'https:' ? 'wss' : 'ws'
// Token sent as first message, NOT in URL query string
const ws = new WebSocket(`${proto}://${window.location.host}/api/terminal`)
const thisWs = ws // Capture reference to detect stale onclose
s.ws = ws
ws.onopen = () => {
// Send auth as first message
ws.send(JSON.stringify({ type: 'auth', token }))
ws.send(JSON.stringify({ type: 'connect', integrationId: id, cols: term.cols, rows: term.rows, tmuxSession }))
}
ws.onclose = () => {
// Guard: only update state if this WS is still the active one
if (s.ws !== thisWs) return
s.connected = false
bump()
}
// ... rest of message handling unchanged
}
```
The same pattern applies to `fetchTmuxSessions()` — remove `?token=` from URL, send auth as first message.
### Component 2: tmux Session Name Validation (Backend)
**File:** `backend/src/routes/terminal.ts`
**Current state:** The regex `TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/` is already defined and used. The issue is that the validated name is interpolated directly into a `tmux attach -t ${tmuxSession}` command without shell quoting.
**Fix:** Apply `shQuote()` to the validated name in command construction, or use the existing pattern where invalid names fall through to `null`:
```typescript
const TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/
// In the connect handler:
const tmuxSession = msg.tmuxSession && TMUX_NAME_RE.test(msg.tmuxSession)
? msg.tmuxSession
: null
if (tmuxSession) {
// Name is validated to contain only safe chars; quote for defense-in-depth
const safe = tmuxSession.replace(/'/g, "") // Impossible given regex, but belt+suspenders
client.exec(`tmux attach -t '${safe}' || tmux new-session -s '${safe}'`, {
pty: { cols, rows, term: 'xterm-256color' },
}, onChannel)
}
```
### Component 3: Docker Container Reference Validation (Backend)
**File:** `backend/src/ssh/docker.ts`
**Current state:** Already has `CONTAINER_REF_RE` and `shQuote()`. The regex and quoting are correctly implemented. Verify the regex anchors and character set are tight:
```typescript
const CONTAINER_REF_RE = /^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$/
```
This is already correct. The `shQuote()` function is already applied in `containerLogs`, `containerAction`, `removeContainer`, and `buildExecShellCommand`. No code change needed — the audit finding is already resolved in the current codebase.
### Component 4: Sidebar Promise Error Handling (Frontend)
**File:** `src/components/Sidebar.tsx`
**Fix:** Add `.catch()` to the unhandled promise:
```typescript
useEffect(() => {
api.listIntegrations()
.then(({ integrations }) => setIntegrations(integrations))
.catch(() => {}) // Leave state as null → shows "Checking…"
}, [])
```
### Component 5: WebSocket Authentication via First Message (Backend)
**Files:** `backend/src/routes/terminal.ts`, `backend/src/routes/docker.ts`
**Current state:** Both routes verify `req.query.token` on `connect`/`list_tmux` messages. This needs to change to a first-message auth protocol.
**Design:**
```typescript
// Shared auth state per WSocket connection
let authenticated = false
socket.on('message', async (raw: Buffer) => {
let msg: ClientMessage
try {
msg = JSON.parse(raw.toString())
} catch {
send(socket, { type: 'error', message: 'Invalid JSON' })
return
}
// Gate: first message must be auth
if (!authenticated) {
if (msg.type !== 'auth' || !msg.token) {
send(socket, { type: 'error', message: 'Authentication required' })
socket.close()
return
}
try {
await app.jwt.verify(msg.token)
authenticated = true
send(socket, { type: 'authenticated' })
} catch {
send(socket, { type: 'error', message: 'Unauthorized' })
socket.close()
}
return
}
// Normal message processing (connect, input, resize, etc.)
// ...
})
```
The `ClientMessage` type gets a new variant: `type: 'auth'` with `token: string`.
### Component 6: File Path Traversal Prevention (Backend)
**File:** `backend/src/routes/files.ts`
**Design:** Add a validation function applied to all path inputs before any SFTP operation:
```typescript
import { posix } from 'node:path'
function validatePath(path: string): string {
// Reject absolute paths
if (path.startsWith('/')) {
throw Object.assign(new Error('Absolute paths are not allowed'), { statusCode: 400 })
}
// Normalize and check for traversal
const normalized = posix.normalize(path)
if (normalized.startsWith('../') || normalized === '..' || normalized.includes('/../')) {
throw Object.assign(new Error('Path traversal is not allowed'), { statusCode: 400 })
}
return normalized
}
```
Applied at the top of every endpoint handler that accepts a user path: `list`, `content` (read), `content` (write), `mkdir`, `rename`, `delete`, `chmod`, `download`, `upload`.
### Component 7: Agent Token Timing-Safe Comparison (Backend)
**File:** `backend/src/routes/agents.ts`
**Current state:** The `agentTokenValid()` function already uses `timingSafeEqual` but has an early return on length mismatch that leaks timing information.
**Fix:** Pad to equal length before comparison:
```typescript
function agentTokenValid(req: FastifyRequest): { ok: boolean; configured: boolean } {
const expected = process.env.ARCHNEST_AGENT_TOKEN
if (!expected) return { ok: false, configured: false }
const header = req.headers.authorization ?? ''
const presented = header.startsWith('Bearer ') ? header.slice(7) : ''
const a = Buffer.from(presented)
const b = Buffer.from(expected)
// Pad shorter buffer to match longer — constant-time regardless of length diff
const maxLen = Math.max(a.length, b.length)
const aPadded = Buffer.alloc(maxLen)
const bPadded = Buffer.alloc(maxLen)
a.copy(aPadded)
b.copy(bPadded)
const match = timingSafeEqual(aPadded, bPadded)
// Both length AND content must match
return { ok: match && a.length === b.length, configured: true }
}
```
### Component 8: Data Import Size Limit (Backend)
**File:** `backend/src/routes/data.ts`
**Fix:** Add `bodyLimit` to the route registration:
```typescript
app.post('/api/data/import', {
onRequest: [app.adminOnly],
bodyLimit: 10 * 1024 * 1024, // 10 MB
}, async (req, reply) => {
// ... existing handler
})
```
Fastify natively returns 413 Payload Too Large when exceeded, before JSON parsing.
### Component 9: SSH Connection Leak Prevention (Backend)
**File:** `backend/src/ssh/docker.ts`
**Current state:** The `withSshClient()` function already has a `finally` block that calls `client.end()` and `jumpRef.current?.end()`. This is correctly implemented in the current codebase. The audit noted older code; the fix is already present.
Verify the `finally` block:
```typescript
} finally {
client.end()
jumpRef.current?.end()
}
```
This is already in place. No change needed.
### Component 10: CORS Origin Fail-Closed Default (Backend)
**File:** `backend/src/server.ts`
**Current state:** `origin: process.env.ARCHNEST_CORS_ORIGIN ?? true` — falls back to `true` (allow all).
**Fix:**
```typescript
const corsOrigin = process.env.ARCHNEST_CORS_ORIGIN || false
if (!process.env.ARCHNEST_CORS_ORIGIN) {
app.log.warn('ARCHNEST_CORS_ORIGIN is not set — all cross-origin requests will be blocked')
}
await app.register(cors, { origin: corsOrigin })
```
### Component 11: React Error Boundary (Frontend)
**File:** `src/components/ErrorBoundary.tsx` (new)
```typescript
import { Component, type ReactNode, type ErrorInfo } from 'react'
interface Props { children: ReactNode }
interface State { hasError: boolean }
export class ErrorBoundary extends Component<Props, State> {
state: State = { hasError: false }
static getDerivedStateFromError(): State {
return { hasError: true }
}
componentDidCatch(error: Error, info: ErrorInfo) {
console.error('[ErrorBoundary]', error, info.componentStack)
}
render() {
if (this.state.hasError) {
return (
<div style={{
display: 'flex', flexDirection: 'column', alignItems: 'center',
justifyContent: 'center', height: '100vh', backgroundColor: '#0D0E10',
color: '#E8E6E0', fontFamily: 'system-ui, sans-serif',
}}>
<p style={{ fontSize: '16px', marginBottom: '16px' }}>
Something went wrong.
</p>
<button
onClick={() => window.location.reload()}
style={{
padding: '8px 20px', backgroundColor: '#C8A434',
color: '#0D0E10', border: 'none', borderRadius: '6px',
cursor: 'pointer', fontWeight: 600,
}}
>
Reload Page
</button>
</div>
)
}
return this.props.children
}
}
```
**Integration in `App.tsx`:**
```typescript
import { ErrorBoundary } from './components/ErrorBoundary'
// In App():
return <ErrorBoundary><Dashboard /></ErrorBoundary>
```
### Component 12: WebSocket JSON Parse Error Handling (Backend)
**File:** `backend/src/routes/docker.ts`
**Current state:** The `dockerExecRoutes` handler already has a try/catch around `JSON.parse` that sends `{ type: 'error', message: 'Invalid JSON' }` and does NOT close the connection. This is already correctly implemented.
Verify the existing code:
```typescript
socket.on('message', async (raw: Buffer) => {
let msg: ExecMessage
try {
msg = JSON.parse(raw.toString())
} catch {
sendJson(socket, { type: 'error', message: 'Invalid JSON' })
return // Does not close — client can retry
}
// ...
})
```
Already correct. No change needed.
### Component 13: Session Log Path Traversal Prevention (Backend)
**File:** `backend/src/routes/terminal.ts`
**Fix:** Validate `integrationId` is a positive integer before constructing the log path:
```typescript
function sessionLogPath(integrationId: number): string | null {
// Validate: must be a positive integer (no NaN, no negative, no float)
if (!Number.isInteger(integrationId) || integrationId <= 0) return null
mkdirSync(SESSION_LOG_DIR, { recursive: true })
const stamp = new Date().toISOString().replace(/[:.]/g, '-')
return join(SESSION_LOG_DIR, `${integrationId}_${stamp}.log`)
}
```
Callers check for `null` return and skip logging.
### Component 14: CloudFormation Test Deploy Template
**File:** `infra/test-deploy.yml`
```yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: >-
ArchNest test deployment - t4g.small EC2 with Docker in us-east-1.
Budget alarm at $30/month. Destroy after testing.
Parameters:
KeyPairName:
Type: AWS::EC2::KeyPair::KeyName
Description: SSH key pair for instance access
NotificationEmail:
Type: String
Description: Email for budget alarm notifications
Resources:
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ArchNest test - SSH + HTTP/HTTPS
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t4g.small
KeyName: !Ref KeyPairName
ImageId: !Sub '{{resolve:ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64}}'
SecurityGroupIds:
- !GetAtt SecurityGroup.GroupId
UserData:
Fn::Base64: |
#!/bin/bash -xe
dnf update -y
dnf install -y docker git
systemctl enable docker && systemctl start docker
usermod -aG docker ec2-user
# Install Docker Compose v2 plugin
mkdir -p /usr/local/lib/docker/cli-plugins
curl -fsSL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
BudgetAlarm:
Type: AWS::Budgets::Budget
Properties:
Budget:
BudgetName: archnest-test-budget
BudgetLimit:
Amount: 30
Unit: USD
TimeUnit: MONTHLY
BudgetType: COST
NotificationsWithSubscribers:
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 80
Subscribers:
- SubscriptionType: EMAIL
Address: !Ref NotificationEmail
Outputs:
PublicIP:
Value: !GetAtt Instance.PublicIp
Description: Instance public IP for SSH access
InstanceId:
Value: !Ref Instance
Description: EC2 instance ID
```
## Data Models
No new database tables or schema changes are required. All fixes operate on existing data structures.
**New TypeScript interface (WebSocket auth message):**
```typescript
interface AuthMessage {
type: 'auth'
token: string
}
// ClientMessage union extended:
type ClientMessage =
| AuthMessage
| { type: 'connect'; integrationId?: number; cols?: number; rows?: number; tmuxSession?: string }
| { type: 'input'; data?: string }
| { type: 'resize'; cols?: number; rows?: number }
| { type: 'disconnect' }
| { type: 'list_tmux'; integrationId?: number }
```
## Error Handling
| Component | Error Case | Behavior |
|-----------|-----------|----------|
| Path validation | Traversal/absolute path | HTTP 400, descriptive message |
| WS auth gate | Missing/invalid token | Error frame + close connection |
| Agent token | Wrong token (any form) | HTTP 401, identical response |
| Data import | Body > 10MB | HTTP 413 (Fastify built-in) |
| JSON parse | Malformed WS message | Error frame, keep connection open |
| Session log | Invalid integrationId | Skip logging, no error to user |
| Error Boundary | Component crash | Fallback UI with reload button |
| Sidebar | API rejection | Swallow error, show "Checking…" |
## Testing Strategy
**Unit tests** (example-based): WebSocket session lifecycle (Req 1), Sidebar error catch (Req 4), frontend token removal from URL (Req 5.15.2), Error Boundary rendering (Req 11), CORS default (Req 10), body size limit (Req 8).
**Property tests** (100+ iterations): Input validation functions (tmux names, container refs, path traversal, integration IDs), token comparison correctness, WS auth gate, SSH cleanup guarantee, JSON parse resilience.
**Smoke tests**: CloudFormation template structure validation (Req 14) — verify resource types, parameters, and outputs exist with correct values.
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: tmux Session Name Validation Prevents Injection
*For any* string `s`, if `TMUX_NAME_RE.test(s)` returns true, then `s` contains only characters from `[A-Za-z0-9_-]` and has length 164, and the resulting shell command `tmux attach -t '${s}'` is safe from injection. *For any* string `s` that contains characters outside this set or exceeds 64 chars, the validator SHALL reject it.
**Validates: Requirements 2.1, 2.3**
### Property 2: Container Reference Validation and Safe Escaping
*For any* string `ref`, `isValidContainerRef(ref)` returns true if and only if `ref` matches `^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$`. *For any* valid container reference, `shQuote(ref)` produces a string that, when embedded in a shell command, cannot break out of single quotes.
**Validates: Requirements 3.1, 3.3**
### Property 3: Path Traversal Prevention
*For any* path string `p`, if `p` starts with `/` or if `posix.normalize(p)` contains a `..` component that would escape the root (starts with `../`, equals `..`, or contains `/../`), then `validatePath(p)` SHALL throw an error. *For any* relative path without traversal components, `validatePath(p)` SHALL return the normalized path.
**Validates: Requirements 6.1, 6.2**
### Property 4: Agent Token Comparison Correctness
*For any* pair of token strings `(presented, expected)`, the `agentTokenValid` function returns `ok: true` if and only if `presented === expected`. The function SHALL never throw regardless of input lengths, and SHALL return the same HTTP 401 response shape for all rejection cases (length mismatch or content mismatch).
**Validates: Requirements 7.1, 7.2, 7.3**
### Property 5: WebSocket Auth Gate Rejects Unauthenticated Messages
*For any* WebSocket message received before a valid `auth` message has been processed, the backend SHALL respond with an error frame and close the connection. No `connect`, `list_tmux`, `input`, or `resize` message SHALL be processed in the unauthenticated state.
**Validates: Requirements 5.3, 5.4**
### Property 6: SSH Connection Cleanup Guarantee
*For any* operation function `fn` passed to `withSshClient`, regardless of whether `fn` resolves or rejects, both the primary SSH client and the jump-host client (if any) SHALL have `.end()` called before the `withSshClient` promise settles.
**Validates: Requirements 9.1, 9.2**
### Property 7: WebSocket Invalid JSON Resilience
*For any* byte sequence sent to the Docker exec WebSocket that is not valid JSON, the handler SHALL send `{ type: 'error', message: 'Invalid JSON' }` and SHALL NOT close the WebSocket connection.
**Validates: Requirements 12.1, 12.2**
### Property 8: Integration ID Numeric Validation for Session Logs
*For any* value `v` used as `integrationId` when session logging is enabled, if `v` is not a positive integer (`Number.isInteger(v) && v > 0`), then `sessionLogPath` SHALL return `null` and no filesystem write SHALL occur.
**Validates: Requirements 13.1, 13.2**

View file

@ -0,0 +1,164 @@
# Requirements Document
## Introduction
This feature addresses 13 Critical and High severity issues identified during a code audit of the ArchNest self-hosted ops dashboard. The fixes target security vulnerabilities (injection, traversal, timing attacks, token exposure), resource leaks (WebSocket, SSH connections), and stability gaps (missing error boundaries, unhandled exceptions). A CloudFormation template for test deployment is also included to verify fixes in an isolated environment.
## Glossary
- **Backend**: The Fastify 5 + TypeScript server application in `backend/src/`
- **Frontend**: The React 19 + Vite 8 + TypeScript client application in `src/`
- **WebSocket_Session**: A browser-to-server WebSocket connection used for terminal, Docker exec, or tmux-list operations
- **Terminal_Route**: The backend WebSocket endpoint at `/api/terminal` handling SSH terminal sessions
- **Docker_SSH_Module**: The `backend/src/ssh/docker.ts` module that runs Docker CLI commands over SSH
- **Files_Route**: The backend REST endpoint group at `/api/files/:integrationId/*` for SFTP file operations
- **Agents_Route**: The backend endpoint at `/api/agents/docker/report` for agent token-gated ingest
- **Data_Route**: The backend endpoint group at `/api/data/import` and `/api/data/export` for backup/restore
- **Docker_Exec_Route**: The backend WebSocket endpoint at `/api/docker/exec` for Docker container exec sessions
- **Error_Boundary**: A React class component that catches JavaScript errors in its child component tree and renders a fallback UI
- **CloudFormation_Template**: An AWS CloudFormation YAML file that provisions test infrastructure (EC2, security group, budget alarm)
- **Container_Ref**: A Docker container name or ID string validated before interpolation into shell commands
- **Integration_ID**: A numeric identifier for an SSH/Docker integration stored in the SQLite database
## Requirements
### Requirement 1: WebSocket Session Leak Prevention
**User Story:** As an operator, I want terminal reconnections to properly close prior WebSocket connections, so that dangling sessions do not accumulate and exhaust server resources.
#### Acceptance Criteria
1. WHEN a new terminal WebSocket connection is initiated for a pane, THE Frontend SHALL close the existing WebSocket (if any) before creating a new WebSocket instance.
2. WHEN the existing WebSocket readyState is OPEN or CONNECTING, THE Frontend SHALL call `ws.close()` on the existing reference prior to reassignment.
3. IF the WebSocket `onclose` event fires after a new connection has replaced the reference, THEN THE Frontend SHALL not modify state belonging to the new connection.
### Requirement 2: tmux Session Name Injection Prevention
**User Story:** As a security engineer, I want tmux session names to be strictly validated, so that shell metacharacter injection through the terminal WebSocket is impossible.
#### Acceptance Criteria
1. THE Terminal_Route SHALL validate tmux session names against the pattern `^[A-Za-z0-9_-]{1,64}$`.
2. WHEN a `connect` message includes a `tmuxSession` value that does not match the allowed pattern, THE Terminal_Route SHALL treat it as null and open a plain shell instead.
3. WHEN a valid tmux session name is used in a shell command, THE Terminal_Route SHALL pass it only within a validated context where no additional characters can be appended by the client.
### Requirement 3: Docker Container Reference Validation
**User Story:** As a security engineer, I want container name/ID references to be tightly validated, so that shell command injection through crafted container identifiers is prevented.
#### Acceptance Criteria
1. THE Docker_SSH_Module SHALL validate container references against the pattern `^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$`.
2. WHEN a container reference fails validation, THE Docker_SSH_Module SHALL throw an error before any shell command is constructed.
3. THE Docker_SSH_Module SHALL pass validated container references through single-quote shell escaping before interpolation into commands.
### Requirement 4: Sidebar Promise Error Handling
**User Story:** As a user, I want the sidebar to handle API failures gracefully, so that an unhandled promise rejection does not crash the application or produce console errors.
#### Acceptance Criteria
1. WHEN the `api.listIntegrations()` call in the Sidebar component rejects, THE Frontend SHALL catch the error and leave the integrations state as null.
2. IF the integrations API call fails, THEN THE Frontend SHALL display the "Checking…" status label rather than an error state.
### Requirement 5: WebSocket Authentication via First Message
**User Story:** As a security engineer, I want JWT tokens removed from WebSocket URL query strings, so that tokens are not logged in server access logs or proxy logs.
#### Acceptance Criteria
1. THE Frontend SHALL not include the JWT token as a URL query parameter when opening terminal or Docker exec WebSocket connections.
2. WHEN a terminal WebSocket connection opens, THE Frontend SHALL send an authentication message containing the JWT token as the first message before any other message type.
3. WHEN the Backend receives a WebSocket connection on the terminal or Docker exec endpoints, THE Backend SHALL require a valid JWT token in the first message before processing `connect` or `list_tmux` messages.
4. IF the first message does not contain a valid JWT token, THEN THE Backend SHALL send an error frame and close the WebSocket connection.
### Requirement 6: File Path Traversal Prevention
**User Story:** As a security engineer, I want file operation paths to be validated against directory traversal, so that attackers cannot access files outside the intended directory scope.
#### Acceptance Criteria
1. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject paths containing `../` sequences after normalization.
2. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject absolute paths (paths starting with `/`).
3. IF a path fails traversal validation, THEN THE Files_Route SHALL return HTTP 400 with a descriptive error message.
4. THE Files_Route SHALL apply path validation to all endpoints that accept a user-supplied path: list, content read, content write, mkdir, rename, delete, chmod, download, and upload.
### Requirement 7: Agent Token Timing-Safe Comparison
**User Story:** As a security engineer, I want agent token comparison to use constant-time equality, so that timing side-channel attacks cannot be used to guess the token byte-by-byte.
#### Acceptance Criteria
1. THE Agents_Route SHALL compare presented tokens using `crypto.timingSafeEqual()`.
2. WHEN the presented token length differs from the expected token length, THE Agents_Route SHALL pad both buffers to equal length before performing the constant-time comparison.
3. THE Agents_Route SHALL return the same HTTP response (401 Unauthorized) regardless of whether the token length or content mismatched.
### Requirement 8: Data Import Size Limit
**User Story:** As an operator, I want data import requests to have a body size limit, so that an admin cannot accidentally or maliciously cause a denial-of-service by importing an extremely large JSON payload.
#### Acceptance Criteria
1. THE Data_Route import endpoint SHALL enforce a maximum request body size of 10 MB.
2. IF a request body exceeds 10 MB, THEN THE Data_Route SHALL reject the request with HTTP 413 before parsing the JSON payload.
### Requirement 9: SSH Connection Leak Prevention
**User Story:** As an operator, I want SSH connections used for Docker-over-SSH operations to be closed reliably, so that connection leaks do not exhaust SSH connection limits on remote hosts.
#### Acceptance Criteria
1. THE Docker_SSH_Module `withSshClient` function SHALL close both the primary SSH client and any jump-host client in a `finally` block after the operation completes.
2. IF the operation function throws an error, THEN THE Docker_SSH_Module SHALL still close both SSH connections before returning the error result.
### Requirement 10: CORS Origin Fail-Closed Default
**User Story:** As a security engineer, I want CORS to reject all cross-origin requests when no explicit origin is configured, so that a misconfigured deployment does not silently allow any origin.
#### Acceptance Criteria
1. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is not set, THE Backend SHALL default CORS origin to `false` (reject all cross-origin requests).
2. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is set to a valid origin string, THE Backend SHALL use that value as the allowed CORS origin.
3. THE Backend SHALL log a warning at startup when `ARCHNEST_CORS_ORIGIN` is not configured, indicating that cross-origin requests are blocked.
### Requirement 11: React Error Boundary
**User Story:** As a user, I want the application to catch rendering errors gracefully, so that a single component crash does not take down the entire dashboard.
#### Acceptance Criteria
1. THE Frontend SHALL wrap the Dashboard component tree in an Error_Boundary component.
2. WHEN a child component throws during rendering, THE Error_Boundary SHALL catch the error and render a fallback UI instead of a blank screen.
3. THE Error_Boundary fallback UI SHALL display a message indicating an error occurred and provide a way to reload the page.
4. THE Error_Boundary SHALL log the caught error to the browser console for debugging.
### Requirement 12: WebSocket JSON Parse Error Handling
**User Story:** As an operator, I want malformed WebSocket messages to be handled gracefully, so that invalid JSON from a client does not crash the connection handler.
#### Acceptance Criteria
1. WHEN the Docker_Exec_Route receives a WebSocket message that is not valid JSON, THE Docker_Exec_Route SHALL send an error frame with message "Invalid JSON" to the client.
2. WHEN the Docker_Exec_Route receives invalid JSON, THE Docker_Exec_Route SHALL not close the WebSocket connection (allowing the client to retry).
### Requirement 13: Session Log Path Traversal Prevention
**User Story:** As a security engineer, I want the integrationId used in session log file paths to be strictly validated as numeric, so that path traversal through crafted identifiers is impossible.
#### Acceptance Criteria
1. WHEN session logging is enabled, THE Terminal_Route SHALL validate that the integrationId is a positive integer before constructing the log file path.
2. IF the integrationId is not a valid positive integer, THEN THE Terminal_Route SHALL skip session logging for that connection rather than writing to an unvalidated path.
### Requirement 14: CloudFormation Test Deploy Template
**User Story:** As a developer, I want a CloudFormation template that provisions a t4g.small EC2 instance with Docker in us-east-1, so that I can verify all fixes in an isolated environment within a $30/month budget.
#### Acceptance Criteria
1. THE CloudFormation_Template SHALL provision a t4g.small EC2 instance in us-east-1 running Amazon Linux 2023.
2. THE CloudFormation_Template SHALL create a security group allowing inbound SSH (port 22) and HTTP/HTTPS (ports 80, 443) from any source.
3. THE CloudFormation_Template SHALL install Docker and Docker Compose on the EC2 instance via UserData.
4. THE CloudFormation_Template SHALL create an AWS Budget alarm at $30/month threshold with email notification.
5. THE CloudFormation_Template SHALL output the instance public IP and instance ID for SSH access.
6. THE CloudFormation_Template SHALL accept parameters for the SSH key pair name and notification email address.

View file

@ -0,0 +1,153 @@
# Implementation Plan: Code Audit Fixes
## Overview
Surgical fixes for 13 Critical + High audit issues across the ArchNest backend (Fastify 5 + TypeScript) and frontend (React 19 + TypeScript), plus a CloudFormation test deploy template. Each task targets a specific file with a minimal, correct patch. No new dependencies introduced.
## Tasks
- [ ] 1. Backend security hardening — input validation and auth
- [ ] 1.1 Add path traversal prevention to `backend/src/routes/files.ts`
- Add `validatePath()` function using `posix.normalize()` to reject absolute paths and `..` traversal
- Apply validation at the top of every handler that accepts a user-supplied path (list, content read, content write, mkdir, rename, delete, chmod, download, upload)
- Return HTTP 400 with descriptive error on rejection
- _Requirements: 6.1, 6.2, 6.3, 6.4_
- [ ] 1.2 Fix agent token timing-safe comparison in `backend/src/routes/agents.ts`
- Replace early-return on length mismatch with padded constant-time comparison
- Pad both buffers to `Math.max(a.length, b.length)` before `timingSafeEqual`
- Ensure `ok: true` only when both length AND content match
- Same 401 response for all rejection cases
- _Requirements: 7.1, 7.2, 7.3_
- [ ] 1.3 Add session log integrationId validation in `backend/src/routes/terminal.ts`
- Update `sessionLogPath()` to return `null` when integrationId is not a positive integer
- Callers skip logging on `null` return
- _Requirements: 13.1, 13.2_
- [ ] 1.4 Add body size limit to data import in `backend/src/routes/data.ts`
- Add `bodyLimit: 10 * 1024 * 1024` to the POST `/api/data/import` route registration
- Fastify returns 413 automatically when exceeded
- _Requirements: 8.1, 8.2_
- [ ] 1.5 Set CORS origin to fail-closed default in `backend/src/server.ts`
- Change fallback from `true` to `false` when `ARCHNEST_CORS_ORIGIN` is not set
- Log a warning at startup when env var is missing
- _Requirements: 10.1, 10.2, 10.3_
- [ ]* 1.6 Write property tests for path traversal, agent token, and integrationId validation
- **Property 3: Path Traversal Prevention** — verify `validatePath` rejects all `..` escape patterns and accepts valid relative paths
- **Property 4: Agent Token Comparison Correctness** — verify `ok: true` iff `presented === expected`, never throws
- **Property 8: Integration ID Numeric Validation** — verify `sessionLogPath` returns null for non-positive-integer values
- **Validates: Requirements 6.1, 6.2, 7.1, 7.2, 7.3, 13.1, 13.2**
- [ ] 2. Checkpoint — Backend security
- Ensure all tests pass, ask the user if questions arise.
- [ ] 3. WebSocket authentication refactor
- [ ] 3.1 Implement first-message auth gate in `backend/src/routes/terminal.ts`
- Add `authenticated` flag per connection
- Require `{ type: 'auth', token }` as first message; verify JWT
- Reject all other message types before auth with error frame + close
- Remove `req.query.token` usage from connect/list_tmux handlers
- _Requirements: 5.3, 5.4_
- [ ] 3.2 Implement first-message auth gate in `backend/src/routes/docker.ts`
- Same pattern as terminal: auth-first protocol for Docker exec WebSocket
- Verify existing JSON parse error handling remains intact (already correct per design)
- _Requirements: 5.3, 5.4, 12.1, 12.2_
- [ ] 3.3 Update frontend WebSocket connections in `src/lib/TerminalSessionContext.tsx`
- Remove `?token=` from WebSocket URL query string
- Send `{ type: 'auth', token }` as first message on `ws.onopen`
- Apply same change to `fetchTmuxSessions()` WebSocket
- _Requirements: 5.1, 5.2_
- [ ] 3.4 Fix WebSocket session leak in `src/lib/TerminalSessionContext.tsx`
- Guard `ws.close()` with readyState check (OPEN or CONNECTING)
- Capture `thisWs` reference; in `onclose`, bail if `s.ws !== thisWs`
- _Requirements: 1.1, 1.2, 1.3_
- [ ]* 3.5 Write property test for WebSocket auth gate
- **Property 5: WebSocket Auth Gate Rejects Unauthenticated Messages**
- Verify no message type other than `auth` is processed before authentication
- **Validates: Requirements 5.3, 5.4**
- [ ] 4. Checkpoint — WebSocket auth
- Ensure all tests pass, ask the user if questions arise.
- [ ] 5. Backend — tmux validation and SSH cleanup verification
- [ ] 5.1 Harden tmux session name usage in `backend/src/routes/terminal.ts`
- Ensure validated name is single-quoted in shell command construction
- Defense-in-depth: strip any `'` (impossible given regex, but belt+suspenders)
- _Requirements: 2.1, 2.2, 2.3_
- [ ] 5.2 Verify container ref validation in `backend/src/ssh/docker.ts`
- Confirm `CONTAINER_REF_RE` regex is `^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$`
- Confirm `shQuote()` is applied to all container ref interpolations
- Confirm SSH cleanup in `withSshClient` finally block is present
- If already correct (per design), add a code comment noting audit verification
- _Requirements: 3.1, 3.2, 3.3, 9.1, 9.2_
- [ ]* 5.3 Write property tests for tmux name validation and container ref validation
- **Property 1: tmux Session Name Validation Prevents Injection** — verify only `[A-Za-z0-9_-]{1,64}` passes
- **Property 2: Container Reference Validation and Safe Escaping** — verify regex and `shQuote` safety
- **Property 6: SSH Connection Cleanup Guarantee** — verify `withSshClient` always calls `.end()`
- **Validates: Requirements 2.1, 2.3, 3.1, 3.3, 9.1, 9.2**
- [ ] 6. Frontend stability fixes
- [ ] 6.1 Add `.catch()` to Sidebar promise in `src/components/Sidebar.tsx`
- Append `.catch(() => {})` to the `api.listIntegrations()` call
- Ensure integrations state remains null on failure (shows "Checking…")
- _Requirements: 4.1, 4.2_
- [ ] 6.2 Create Error Boundary component at `src/components/ErrorBoundary.tsx`
- React class component with `getDerivedStateFromError` + `componentDidCatch`
- Fallback UI: centered message + gold "Reload Page" button on dark background
- Log error to console
- _Requirements: 11.2, 11.3, 11.4_
- [ ] 6.3 Wrap Dashboard in ErrorBoundary in `src/App.tsx`
- Import and wrap the top-level Dashboard component tree
- _Requirements: 11.1_
- [ ]* 6.4 Write unit tests for ErrorBoundary and Sidebar error handling
- Verify ErrorBoundary renders fallback on child throw
- Verify Sidebar swallows rejected promise without crashing
- **Validates: Requirements 4.1, 4.2, 11.1, 11.2, 11.3, 11.4**
- [ ] 7. CloudFormation test deploy template
- [ ] 7.1 Create `infra/test-deploy.yml` CloudFormation template
- Parameters: KeyPairName (KeyPair type), NotificationEmail (String)
- Resources: SecurityGroup (SSH + HTTP/HTTPS), EC2 Instance (t4g.small, AL2023 ARM64, Docker + Compose via UserData), Budget alarm ($30/month, 80% threshold)
- Outputs: PublicIP, InstanceId
- _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_
- [ ]* 7.2 Write smoke test validating CloudFormation template structure
- Verify required resource types, parameters, and outputs exist
- Validate YAML parses correctly
- **Validates: Requirements 14.114.6**
- [ ] 8. Final checkpoint
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Components 3 (container ref), 9 (SSH cleanup), and 12 (JSON parse) are verified-already-correct per design — task 5.2 confirms with a code comment
- The project uses TypeScript throughout (Fastify 5 backend, React 19 frontend)
- No new dependencies are introduced; all fixes use Node.js built-ins and existing patterns
## Task Dependency Graph
```json
{
"waves": [
{ "id": 0, "tasks": ["1.1", "1.2", "1.3", "1.4", "1.5", "6.1", "6.2", "7.1"] },
{ "id": 1, "tasks": ["1.6", "6.3", "6.4", "7.2"] },
{ "id": 2, "tasks": ["3.1", "3.2", "5.1", "5.2"] },
{ "id": 3, "tasks": ["3.3", "3.4", "3.5", "5.3"] }
]
}
```

View file

@ -0,0 +1,176 @@
---
inclusion: manual
---
# ArchNest Code Audit — Known Issues & Fix Plan
> Last audited: 2026-06-24. This file tracks known issues by severity.
> Use this to guide fixes before production deploy.
---
## CRITICAL (3) — Must fix before deploy
### 1. Terminal WebSocket session leak
- **File**: `src/lib/TerminalSessionContext.tsx`
- **Problem**: Old WebSocket reference overwritten before cleanup runs on reconnect, creating dangling connections that never close.
- **Fix**: Close existing WS before creating new one. Guard against double-open.
### 2. tmux session name injection
- **File**: `backend/src/routes/terminal.ts`
- **Problem**: Regex allows characters that don't fully prevent shell metacharacter injection when constructing the `tmux attach -t` command.
- **Fix**: Whitelist alphanumeric + dash + underscore only. Reject everything else.
### 3. Docker container ref validation insufficient
- **File**: `backend/src/ssh/docker.ts`
- **Problem**: `CONTAINER_REF_RE` may allow problematic characters through validation gaps.
- **Fix**: Tighten regex to `^[a-zA-Z0-9][a-zA-Z0-9_.-]*$` only.
---
## HIGH (10) — Fix before production
### 4. Unhandled promise in Sidebar
- **File**: `src/components/Sidebar.tsx`
- **Fix**: Add `.catch()` to `api.listIntegrations()` call.
### 5. JWT token in WebSocket URL
- **File**: `src/lib/TerminalSessionContext.tsx`
- **Problem**: Token in query string gets logged in server access logs.
- **Fix**: Send token as first WebSocket message or via subprotocol.
### 6. Path traversal in file operations
- **File**: `backend/src/routes/files.ts`
- **Problem**: No validation against `../` sequences or absolute paths.
- **Fix**: Normalize path, reject if it escapes the allowed root.
### 7. Agent token timing attack
- **File**: `backend/src/routes/agents.ts`
- **Problem**: Early return on length mismatch leaks timing info.
- **Fix**: Use `crypto.timingSafeEqual()` with Buffer padding.
### 8. No data import size limit
- **File**: `backend/src/routes/data.ts`
- **Problem**: Admin can import arbitrarily large JSON → DoS.
- **Fix**: Add `bodyLimit` to the route (e.g., 10MB max).
### 9. SSH connection leak in Docker-over-SSH
- **File**: `backend/src/ssh/docker.ts`
- **Problem**: Promise rejection paths may leave SSH connections open.
- **Fix**: Add `finally` block that calls `client.end()`.
### 10. CORS origin defaults to any
- **File**: `docker-compose.yml` / `server.ts`
- **Problem**: `ARCHNEST_CORS_ORIGIN` falls back to `true` (allow all) when unset.
- **Fix**: Require explicit origin in production. Fail-closed.
### 11. No React error boundary
- **File**: `src/App.tsx`
- **Problem**: Any component crash takes down the entire app.
- **Fix**: Add an ErrorBoundary wrapper around `<Dashboard />`.
### 12. WebSocket JSON.parse unhandled
- **File**: `backend/src/routes/docker.ts`
- **Problem**: No try-catch around JSON.parse in WS message handler.
- **Fix**: Wrap in try-catch, send error frame back to client.
### 13. Session log path traversal
- **File**: `backend/src/routes/terminal.ts`
- **Problem**: `integrationId` used directly in filesystem path without sanitization.
- **Fix**: Validate integrationId is numeric only.
---
## MEDIUM (13) — Should fix
### 14. Auth token in localStorage (XSS risk)
- `src/lib/AuthContext.tsx` — JWT in localStorage. If XSS hits, token is stolen.
- Long-term fix: HttpOnly cookie. Short-term: accept risk, add CSP headers.
### 15. No refresh token mechanism
- `backend/src/routes/auth.ts` — Users forced to re-login on token expiry.
- Fix: Add refresh token rotation endpoint.
### 16. No login rate limiting
- `backend/src/routes/auth.ts` — Brute-force attack possible.
- Fix: Add rate limiter plugin (e.g., `@fastify/rate-limit`, 5 attempts/min).
### 17. Weak binary file detection
- `backend/src/routes/files.ts` — Null byte check insufficient for UTF-8 binary files.
- Fix: Check first 512 bytes for control characters (excluding newline/tab).
### 18. Missing category uniqueness
- `backend/src/db/index.ts``bookmark_categories` allows duplicate names.
- Fix: Add UNIQUE constraint on `(name)` or `(name, tenant_id)` for SaaS.
### 19. IntegrationId validation weak
- `backend/src/routes/docker.ts``Number()` on string param doesn't reject `NaN`.
- Fix: Parse with parseInt, check `isNaN()`, return 400.
### 20. SSH shell without PTY in Docker exec
- `backend/src/routes/dockerSsh.ts` — Output buffering issues in non-interactive mode.
- Fix: Always request PTY for exec sessions.
### 21. No privileged port validation
- `backend/src/routes/tunnels.ts` — Ports <1024 allowed without OS privilege.
- Fix: Warn or reject ports <1024 unless running as root.
### 22. No size validation on data export
- `backend/src/routes/data.ts` — Can generate massive JSON response.
- Fix: Stream response or add a row limit.
### 23. Silent listResources failures
- `backend/src/routes/integrations.ts` — Errors caught but not surfaced.
- Fix: Return partial results + error flag per integration.
### 24. Missing admin action audit logging
- Multiple route files — No before/after state comparison in logs.
- Fix: Log old + new values on integration/user updates.
### 25. Terminal resize race condition
- `backend/src/routes/terminal.ts` — Resize between connect and channel-ready is lost.
- Fix: Queue resize events until channel is confirmed ready.
### 26. Missing database indexes
- `backend/src/db/index.ts` — No indexes on `events.created_at`, `sessions.user_id`.
- Fix: Add indexes for frequently-queried columns.
---
## LOW (14) — Nice to have
27. Missing CSP / X-Frame-Options headers
28. No avatar upload size validation
29. CPU stats edge case (single-core shows 0%)
30. Hardcoded tmux/docker format strings
31. State mutations outside React lifecycle
32. Weak CIDR validation error messages
33. File size limit hardcoded (should be configurable)
34. SSH key format not validated on upload
35. No import audit trail (who imported what, when)
36. Session logging file permissions not restricted
37. Missing Content-Security-Policy header
38. No health check for guacd sidecar
39. Hardcoded 5-second metrics polling interval
40. No graceful shutdown for SSH connections on SIGTERM
---
## Deploy Test Plan
### Phase A: Fix Critical + High
Fix issues 1-13 before any deploy. These represent real security and stability risks.
### Phase B: Test Deploy (AWS)
1. Deploy CloudFormation stack (t4g.small EC2, us-east-1)
2. SSH in, clone repo, create `.env`, `docker compose up -d --build`
3. Run through every page: Glance, Infrastructure, Terminal, Tunnels, Files, Containers, Remote Desktop, Host Metrics, BookNest, Settings, Help
4. Test: SSH connect, Docker list, file upload, bookmark CRUD, metrics polling
5. Budget alarm set at $30/month
### Phase C: Destroy
After successful test:
```bash
aws cloudformation delete-stack --stack-name archnest-test
```
This removes ALL resources (EC2, EIP, SG, budget) — clean slate.