- Spec for fixing 13 Critical + High code audit issues - requirements.md (14 requirements, EARS format) - design.md (component fixes + 8 correctness properties) - tasks.md (8 tasks, parallel execution waves) - CloudFormation test deploy template planned - code-audit.md steering file (all 40 issues documented) Co-authored-by: Samuel James <ssamjame@amazon.com> Co-authored-by: Kiro <noreply@kiro.dev>
21 KiB
Design Document: Code Audit Fixes
Overview
This design addresses 13 Critical + High severity issues from the ArchNest code audit plus a CloudFormation test deploy template. The fixes are surgical — each targets a specific file with a minimal, correct patch. No new dependencies are introduced; all fixes use Node.js built-ins, existing Fastify hooks, and standard React patterns already in the codebase.
Architecture
The fixes span three layers:
┌─────────────────────────────────────────────────────────┐
│ Frontend (React 19 + Vite 8 + TypeScript) │
│ ├─ TerminalSessionContext.tsx — WS lifecycle, auth │
│ ├─ Sidebar.tsx — promise error catch │
│ ├─ App.tsx — ErrorBoundary wrapper │
│ └─ ErrorBoundary.tsx — new component │
├─────────────────────────────────────────────────────────┤
│ Backend (Fastify 5 + TypeScript) │
│ ├─ routes/terminal.ts — tmux validation, WS auth, │
│ │ session log ID validation │
│ ├─ routes/files.ts — path traversal prevention │
│ ├─ routes/agents.ts — timing-safe token compare │
│ ├─ routes/data.ts — body size limit │
│ ├─ routes/docker.ts — JSON parse error handling │
│ ├─ ssh/docker.ts — container ref validation, │
│ │ SSH connection cleanup │
│ └─ server.ts — CORS fail-closed default │
├─────────────────────────────────────────────────────────┤
│ Infrastructure │
│ └─ infra/test-deploy.yml — CloudFormation template │
└─────────────────────────────────────────────────────────┘
Components and Interfaces
Component 1: WebSocket Session Leak Prevention (Frontend)
File: src/lib/TerminalSessionContext.tsx
Current problem: The connect() function calls s.ws?.close() but a race condition exists where the old WS onclose handler fires after reassignment and corrupts state belonging to the new connection.
Fix:
function connect(s: PaneSession, id: number, tmuxSession?: string) {
s.disposeListeners?.()
// Close existing WS if OPEN or CONNECTING — guard against leak
if (s.ws && (s.ws.readyState === WebSocket.OPEN || s.ws.readyState === WebSocket.CONNECTING)) {
s.ws.close()
}
s.ws = null
s.connected = false
bump()
const term = s.term
term.reset()
term.writeln('Connecting…')
const token = getToken()
const proto = window.location.protocol === 'https:' ? 'wss' : 'ws'
// Token sent as first message, NOT in URL query string
const ws = new WebSocket(`${proto}://${window.location.host}/api/terminal`)
const thisWs = ws // Capture reference to detect stale onclose
s.ws = ws
ws.onopen = () => {
// Send auth as first message
ws.send(JSON.stringify({ type: 'auth', token }))
ws.send(JSON.stringify({ type: 'connect', integrationId: id, cols: term.cols, rows: term.rows, tmuxSession }))
}
ws.onclose = () => {
// Guard: only update state if this WS is still the active one
if (s.ws !== thisWs) return
s.connected = false
bump()
}
// ... rest of message handling unchanged
}
The same pattern applies to fetchTmuxSessions() — remove ?token= from URL, send auth as first message.
Component 2: tmux Session Name Validation (Backend)
File: backend/src/routes/terminal.ts
Current state: The regex TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/ is already defined and used. The issue is that the validated name is interpolated directly into a tmux attach -t ${tmuxSession} command without shell quoting.
Fix: Apply shQuote() to the validated name in command construction, or use the existing pattern where invalid names fall through to null:
const TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/
// In the connect handler:
const tmuxSession = msg.tmuxSession && TMUX_NAME_RE.test(msg.tmuxSession)
? msg.tmuxSession
: null
if (tmuxSession) {
// Name is validated to contain only safe chars; quote for defense-in-depth
const safe = tmuxSession.replace(/'/g, "") // Impossible given regex, but belt+suspenders
client.exec(`tmux attach -t '${safe}' || tmux new-session -s '${safe}'`, {
pty: { cols, rows, term: 'xterm-256color' },
}, onChannel)
}
Component 3: Docker Container Reference Validation (Backend)
File: backend/src/ssh/docker.ts
Current state: Already has CONTAINER_REF_RE and shQuote(). The regex and quoting are correctly implemented. Verify the regex anchors and character set are tight:
const CONTAINER_REF_RE = /^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$/
This is already correct. The shQuote() function is already applied in containerLogs, containerAction, removeContainer, and buildExecShellCommand. No code change needed — the audit finding is already resolved in the current codebase.
Component 4: Sidebar Promise Error Handling (Frontend)
File: src/components/Sidebar.tsx
Fix: Add .catch() to the unhandled promise:
useEffect(() => {
api.listIntegrations()
.then(({ integrations }) => setIntegrations(integrations))
.catch(() => {}) // Leave state as null → shows "Checking…"
}, [])
Component 5: WebSocket Authentication via First Message (Backend)
Files: backend/src/routes/terminal.ts, backend/src/routes/docker.ts
Current state: Both routes verify req.query.token on connect/list_tmux messages. This needs to change to a first-message auth protocol.
Design:
// Shared auth state per WSocket connection
let authenticated = false
socket.on('message', async (raw: Buffer) => {
let msg: ClientMessage
try {
msg = JSON.parse(raw.toString())
} catch {
send(socket, { type: 'error', message: 'Invalid JSON' })
return
}
// Gate: first message must be auth
if (!authenticated) {
if (msg.type !== 'auth' || !msg.token) {
send(socket, { type: 'error', message: 'Authentication required' })
socket.close()
return
}
try {
await app.jwt.verify(msg.token)
authenticated = true
send(socket, { type: 'authenticated' })
} catch {
send(socket, { type: 'error', message: 'Unauthorized' })
socket.close()
}
return
}
// Normal message processing (connect, input, resize, etc.)
// ...
})
The ClientMessage type gets a new variant: type: 'auth' with token: string.
Component 6: File Path Traversal Prevention (Backend)
File: backend/src/routes/files.ts
Design: Add a validation function applied to all path inputs before any SFTP operation:
import { posix } from 'node:path'
function validatePath(path: string): string {
// Reject absolute paths
if (path.startsWith('/')) {
throw Object.assign(new Error('Absolute paths are not allowed'), { statusCode: 400 })
}
// Normalize and check for traversal
const normalized = posix.normalize(path)
if (normalized.startsWith('../') || normalized === '..' || normalized.includes('/../')) {
throw Object.assign(new Error('Path traversal is not allowed'), { statusCode: 400 })
}
return normalized
}
Applied at the top of every endpoint handler that accepts a user path: list, content (read), content (write), mkdir, rename, delete, chmod, download, upload.
Component 7: Agent Token Timing-Safe Comparison (Backend)
File: backend/src/routes/agents.ts
Current state: The agentTokenValid() function already uses timingSafeEqual but has an early return on length mismatch that leaks timing information.
Fix: Pad to equal length before comparison:
function agentTokenValid(req: FastifyRequest): { ok: boolean; configured: boolean } {
const expected = process.env.ARCHNEST_AGENT_TOKEN
if (!expected) return { ok: false, configured: false }
const header = req.headers.authorization ?? ''
const presented = header.startsWith('Bearer ') ? header.slice(7) : ''
const a = Buffer.from(presented)
const b = Buffer.from(expected)
// Pad shorter buffer to match longer — constant-time regardless of length diff
const maxLen = Math.max(a.length, b.length)
const aPadded = Buffer.alloc(maxLen)
const bPadded = Buffer.alloc(maxLen)
a.copy(aPadded)
b.copy(bPadded)
const match = timingSafeEqual(aPadded, bPadded)
// Both length AND content must match
return { ok: match && a.length === b.length, configured: true }
}
Component 8: Data Import Size Limit (Backend)
File: backend/src/routes/data.ts
Fix: Add bodyLimit to the route registration:
app.post('/api/data/import', {
onRequest: [app.adminOnly],
bodyLimit: 10 * 1024 * 1024, // 10 MB
}, async (req, reply) => {
// ... existing handler
})
Fastify natively returns 413 Payload Too Large when exceeded, before JSON parsing.
Component 9: SSH Connection Leak Prevention (Backend)
File: backend/src/ssh/docker.ts
Current state: The withSshClient() function already has a finally block that calls client.end() and jumpRef.current?.end(). This is correctly implemented in the current codebase. The audit noted older code; the fix is already present.
Verify the finally block:
} finally {
client.end()
jumpRef.current?.end()
}
This is already in place. No change needed.
Component 10: CORS Origin Fail-Closed Default (Backend)
File: backend/src/server.ts
Current state: origin: process.env.ARCHNEST_CORS_ORIGIN ?? true — falls back to true (allow all).
Fix:
const corsOrigin = process.env.ARCHNEST_CORS_ORIGIN || false
if (!process.env.ARCHNEST_CORS_ORIGIN) {
app.log.warn('ARCHNEST_CORS_ORIGIN is not set — all cross-origin requests will be blocked')
}
await app.register(cors, { origin: corsOrigin })
Component 11: React Error Boundary (Frontend)
File: src/components/ErrorBoundary.tsx (new)
import { Component, type ReactNode, type ErrorInfo } from 'react'
interface Props { children: ReactNode }
interface State { hasError: boolean }
export class ErrorBoundary extends Component<Props, State> {
state: State = { hasError: false }
static getDerivedStateFromError(): State {
return { hasError: true }
}
componentDidCatch(error: Error, info: ErrorInfo) {
console.error('[ErrorBoundary]', error, info.componentStack)
}
render() {
if (this.state.hasError) {
return (
<div style={{
display: 'flex', flexDirection: 'column', alignItems: 'center',
justifyContent: 'center', height: '100vh', backgroundColor: '#0D0E10',
color: '#E8E6E0', fontFamily: 'system-ui, sans-serif',
}}>
<p style={{ fontSize: '16px', marginBottom: '16px' }}>
Something went wrong.
</p>
<button
onClick={() => window.location.reload()}
style={{
padding: '8px 20px', backgroundColor: '#C8A434',
color: '#0D0E10', border: 'none', borderRadius: '6px',
cursor: 'pointer', fontWeight: 600,
}}
>
Reload Page
</button>
</div>
)
}
return this.props.children
}
}
Integration in App.tsx:
import { ErrorBoundary } from './components/ErrorBoundary'
// In App():
return <ErrorBoundary><Dashboard /></ErrorBoundary>
Component 12: WebSocket JSON Parse Error Handling (Backend)
File: backend/src/routes/docker.ts
Current state: The dockerExecRoutes handler already has a try/catch around JSON.parse that sends { type: 'error', message: 'Invalid JSON' } and does NOT close the connection. This is already correctly implemented.
Verify the existing code:
socket.on('message', async (raw: Buffer) => {
let msg: ExecMessage
try {
msg = JSON.parse(raw.toString())
} catch {
sendJson(socket, { type: 'error', message: 'Invalid JSON' })
return // Does not close — client can retry
}
// ...
})
Already correct. No change needed.
Component 13: Session Log Path Traversal Prevention (Backend)
File: backend/src/routes/terminal.ts
Fix: Validate integrationId is a positive integer before constructing the log path:
function sessionLogPath(integrationId: number): string | null {
// Validate: must be a positive integer (no NaN, no negative, no float)
if (!Number.isInteger(integrationId) || integrationId <= 0) return null
mkdirSync(SESSION_LOG_DIR, { recursive: true })
const stamp = new Date().toISOString().replace(/[:.]/g, '-')
return join(SESSION_LOG_DIR, `${integrationId}_${stamp}.log`)
}
Callers check for null return and skip logging.
Component 14: CloudFormation Test Deploy Template
File: infra/test-deploy.yml
AWSTemplateFormatVersion: '2010-09-09'
Description: >-
ArchNest test deployment - t4g.small EC2 with Docker in us-east-1.
Budget alarm at $30/month. Destroy after testing.
Parameters:
KeyPairName:
Type: AWS::EC2::KeyPair::KeyName
Description: SSH key pair for instance access
NotificationEmail:
Type: String
Description: Email for budget alarm notifications
Resources:
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ArchNest test - SSH + HTTP/HTTPS
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t4g.small
KeyName: !Ref KeyPairName
ImageId: !Sub '{{resolve:ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64}}'
SecurityGroupIds:
- !GetAtt SecurityGroup.GroupId
UserData:
Fn::Base64: |
#!/bin/bash -xe
dnf update -y
dnf install -y docker git
systemctl enable docker && systemctl start docker
usermod -aG docker ec2-user
# Install Docker Compose v2 plugin
mkdir -p /usr/local/lib/docker/cli-plugins
curl -fsSL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
BudgetAlarm:
Type: AWS::Budgets::Budget
Properties:
Budget:
BudgetName: archnest-test-budget
BudgetLimit:
Amount: 30
Unit: USD
TimeUnit: MONTHLY
BudgetType: COST
NotificationsWithSubscribers:
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 80
Subscribers:
- SubscriptionType: EMAIL
Address: !Ref NotificationEmail
Outputs:
PublicIP:
Value: !GetAtt Instance.PublicIp
Description: Instance public IP for SSH access
InstanceId:
Value: !Ref Instance
Description: EC2 instance ID
Data Models
No new database tables or schema changes are required. All fixes operate on existing data structures.
New TypeScript interface (WebSocket auth message):
interface AuthMessage {
type: 'auth'
token: string
}
// ClientMessage union extended:
type ClientMessage =
| AuthMessage
| { type: 'connect'; integrationId?: number; cols?: number; rows?: number; tmuxSession?: string }
| { type: 'input'; data?: string }
| { type: 'resize'; cols?: number; rows?: number }
| { type: 'disconnect' }
| { type: 'list_tmux'; integrationId?: number }
Error Handling
| Component | Error Case | Behavior |
|---|---|---|
| Path validation | Traversal/absolute path | HTTP 400, descriptive message |
| WS auth gate | Missing/invalid token | Error frame + close connection |
| Agent token | Wrong token (any form) | HTTP 401, identical response |
| Data import | Body > 10MB | HTTP 413 (Fastify built-in) |
| JSON parse | Malformed WS message | Error frame, keep connection open |
| Session log | Invalid integrationId | Skip logging, no error to user |
| Error Boundary | Component crash | Fallback UI with reload button |
| Sidebar | API rejection | Swallow error, show "Checking…" |
Testing Strategy
Unit tests (example-based): WebSocket session lifecycle (Req 1), Sidebar error catch (Req 4), frontend token removal from URL (Req 5.1–5.2), Error Boundary rendering (Req 11), CORS default (Req 10), body size limit (Req 8).
Property tests (100+ iterations): Input validation functions (tmux names, container refs, path traversal, integration IDs), token comparison correctness, WS auth gate, SSH cleanup guarantee, JSON parse resilience.
Smoke tests: CloudFormation template structure validation (Req 14) — verify resource types, parameters, and outputs exist with correct values.
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: tmux Session Name Validation Prevents Injection
For any string s, if TMUX_NAME_RE.test(s) returns true, then s contains only characters from [A-Za-z0-9_-] and has length 1–64, and the resulting shell command tmux attach -t '${s}' is safe from injection. For any string s that contains characters outside this set or exceeds 64 chars, the validator SHALL reject it.
Validates: Requirements 2.1, 2.3
Property 2: Container Reference Validation and Safe Escaping
For any string ref, isValidContainerRef(ref) returns true if and only if ref matches ^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$. For any valid container reference, shQuote(ref) produces a string that, when embedded in a shell command, cannot break out of single quotes.
Validates: Requirements 3.1, 3.3
Property 3: Path Traversal Prevention
For any path string p, if p starts with / or if posix.normalize(p) contains a .. component that would escape the root (starts with ../, equals .., or contains /../), then validatePath(p) SHALL throw an error. For any relative path without traversal components, validatePath(p) SHALL return the normalized path.
Validates: Requirements 6.1, 6.2
Property 4: Agent Token Comparison Correctness
For any pair of token strings (presented, expected), the agentTokenValid function returns ok: true if and only if presented === expected. The function SHALL never throw regardless of input lengths, and SHALL return the same HTTP 401 response shape for all rejection cases (length mismatch or content mismatch).
Validates: Requirements 7.1, 7.2, 7.3
Property 5: WebSocket Auth Gate Rejects Unauthenticated Messages
For any WebSocket message received before a valid auth message has been processed, the backend SHALL respond with an error frame and close the connection. No connect, list_tmux, input, or resize message SHALL be processed in the unauthenticated state.
Validates: Requirements 5.3, 5.4
Property 6: SSH Connection Cleanup Guarantee
For any operation function fn passed to withSshClient, regardless of whether fn resolves or rejects, both the primary SSH client and the jump-host client (if any) SHALL have .end() called before the withSshClient promise settles.
Validates: Requirements 9.1, 9.2
Property 7: WebSocket Invalid JSON Resilience
For any byte sequence sent to the Docker exec WebSocket that is not valid JSON, the handler SHALL send { type: 'error', message: 'Invalid JSON' } and SHALL NOT close the WebSocket connection.
Validates: Requirements 12.1, 12.2
Property 8: Integration ID Numeric Validation for Session Logs
For any value v used as integrationId when session logging is enabled, if v is not a positive integer (Number.isInteger(v) && v > 0), then sessionLogPath SHALL return null and no filesystem write SHALL occur.
Validates: Requirements 13.1, 13.2