Compare commits

...

11 commits
v1.0 ... main

Author SHA1 Message Date
Samuel James
ad4687660c Document the Forgejo CI/CD + racknerd2 setup as the baseline
All checks were successful
Build & Push Images / build (push) Successful in 41s
CI / validate (push) Successful in 51s
Build & Push Images / deploy (push) Successful in 30s
Make the automated pipeline the documented "setup moving forward" and
finish scrubbing the last stale GitHub-Actions/racknerd1 references that
never reached main.

- HANDOFF.md: refresh the stale 2026-06-21 snapshot. New "CI/CD & deploy"
  section (push to main -> build + push to registry.snsnetlabs.com ->
  auto-deploy to racknerd2 over SSH, SHA-pinned, /api/health gate),
  racknerd2 validation-host + SSH-tunnel access notes, Forgejo workflow
  rule, and a current Deployment + orientation section.
- .kiro/steering/project-guide.md: Forgejo-only Git workflow (no gh),
  CI/CD row, registry host, racknerd2 + forgejo-runner SSH entries, and a
  CI/CD pipeline section.
- .kiro/hooks/tunnel-racknerd2-8080.kiro.hook: the "View ArchNest on
  racknerd2" hook (ssh -L 8080:localhost:8080 -N) to view the deployed
  site at http://localhost:8080 (racknerd2's edge only allows port 22).
- src/pages/Settings.tsx: About panel repo URL -> Forgejo.
- .dockerignore: .github -> .forgejo.
- TERMIX_MIGRATION.md / docs/OPEN-SOURCE-RELEASE.md: drop stale
  .github/workflows + "GitHub Actions deploy" references.

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
2026-06-25 13:37:39 -04:00
Samuel James
bddf891c0a Auto-deploy to racknerd2 after a successful build
All checks were successful
Build & Push Images / build (push) Successful in 38s
CI / validate (push) Successful in 1m11s
Build & Push Images / deploy (push) Successful in 23s
Add a `deploy` job to build.yml that needs `build`, so every push to main
builds + pushes the images and then deploys them to racknerd2 over the
mesh, pinned to the built commit's SHA, with an /api/health gate. Fully
hands-off.

The standalone deploy.yml stays as a manual workflow_dispatch for
deploying/rolling back to an arbitrary tag without rebuilding.

deploy/README.md updated to document the auto-deploy flow.

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
2026-06-25 11:34:27 -04:00
Samuel James
2b2a809352 Fix build job: install current docker-ce-cli, not stale docker.io
All checks were successful
Build & Push Images / build (push) Successful in 4m4s
CI / validate (push) Successful in 1m11s
The first CI build failed: the job container (node:22-bookworm) installed
Debian's `docker.io` (Docker 20.10.24, API 1.41), which the host daemon
(29.x, minimum API 1.44) rejects with "client version 1.41 is too old".

Install docker-ce-cli from Docker's official apt repo instead, which is
current and talks to the daemon fine. Verified on the runner: a
node:22-bookworm container with the mounted socket + docker-ce-cli
connects to the 29.1.3 daemon (API 1.52) successfully. This also confirms
the runner's docker_host=automount is working (the client reached the
daemon; only the version was the problem).

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
2026-06-25 11:13:01 -04:00
Samuel James
00fc3ceed3 Point registry at registry.snsnetlabs.com; record even=dev versioning
Some checks failed
Build & Push Images / build (push) Failing after 29s
CI / validate (push) Successful in 1m12s
The Forgejo container registry now lives on a dedicated unproxied
(DNS-only) host, registry.snsnetlabs.com, so large image layers bypass
Cloudflare's ~100 MB request-body cap (the backend image's 262 MB and
317 MB layers previously hit 413 Payload Too Large through the proxied
forgejo.snsnetlabs.com host). The web UI / packages list stays on
forgejo.snsnetlabs.com behind Cloudflare Access SSO.

- build.yml: REGISTRY -> registry.snsnetlabs.com
- deploy/docker-compose.yml: image refs -> registry.snsnetlabs.com
- deploy/README.md: push/pull/login host -> registry.snsnetlabs.com
  (packages web UI URL kept on forgejo.snsnetlabs.com)

Also record the versioning convention in HANDOFF + steering: development
happens on even major versions, releases on odd; currently developing v2
(prior released line is v1, see the v1.0 git tag). package.json and the
About panel are not yet bumped to v2.

Validated end to end: built both images on the runner host, pushed to
registry.snsnetlabs.com (backend included, no 413), pulled on racknerd2,
brought the stack up, /api/health returns {"ok":true} over the mesh IP.

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
2026-06-25 10:55:15 -04:00
Samuel James
066a4f97bc Add Forgejo Actions build + deploy pipeline (registry -> racknerd2)
Build the frontend and backend images in CI, push them to the Forgejo
container registry, and deploy to racknerd2 (validation host) over the
NetBird mesh. racknerd2 only pulls + runs (1.9 GiB RAM, never builds).

- .forgejo/workflows/build.yml: on push to main / manual, build both
  images and push :latest + :<sha> to forgejo.snsnetlabs.com/sam/...
  (installs the docker CLI in the job; relies on the runner's
  docker_host=automount to reach the host engine).
- .forgejo/workflows/deploy.yml: manual dispatch; SSH to racknerd2,
  docker compose pull + up -d, then /api/health check.
- deploy/docker-compose.yml: registry-image compose. Ports bound to the
  mesh IP only (Docker bypasses ufw), so the app is reachable over the
  mesh, not the public interface.
- deploy/.env.example + deploy/README.md: deploy host config + full
  pipeline/prereq docs.
- .gitignore: ignore real .env / deploy/.env.

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
2026-06-25 10:04:59 -04:00
3172104d29 Add code-audit-fixes spec (#5)
All checks were successful
CI / validate (push) Successful in 3m33s
2026-06-24 19:20:18 +00:00
4ab0b2fff6 Document theme palettes + organize assets (#4)
All checks were successful
CI / validate (push) Successful in 49s
2026-06-24 16:27:33 +00:00
04d491c277 System design, CloudFormation, theming assets (#3)
All checks were successful
CI / validate (push) Successful in 48s
2026-06-24 13:55:04 +00:00
320f816100 Add auto-start SSH tunnel hook (#2)
All checks were successful
CI / validate (push) Successful in 48s
2026-06-23 22:58:09 +00:00
d1697fc811 Add Forgejo Actions CI, remove GitHub Actions (#1)
All checks were successful
CI / validate (push) Successful in 47s
2026-06-23 22:52:35 +00:00
Samuel James
4422840dd3 the
Some checks failed
Deploy to racknerd1 / validate (push) Successful in 2m26s
Deploy to racknerd1 / deploy (push) Failing after 4s
2026-06-23 15:55:31 -04:00
43 changed files with 2799 additions and 404 deletions

View file

@ -1,6 +1,6 @@
node_modules node_modules
dist dist
.git .git
.github .forgejo
pics pics
*.md *.md

View file

@ -0,0 +1,106 @@
name: Build & Push Images
# Builds the frontend + backend Docker images and pushes them to the Forgejo
# container registry (registry.snsnetlabs.com/sam/...). Runs on every push to
# main, and on-demand via the "Run workflow" button (workflow_dispatch).
#
# NOTE: registry.snsnetlabs.com is the unproxied (DNS-only) registry host so
# large layers bypass Cloudflare's body cap. The web UI / packages list stays
# on forgejo.snsnetlabs.com (Cloudflare Access SSO).
#
# Requirements (see deploy/README.md):
# - Forgejo Actions secret FORGEJO_REGISTRY_TOKEN: a package-scoped token for
# user `sam`.
# - The runner must allow Docker builds: container.docker_host = "automount"
# in the forgejo-runner config (mounts /var/run/docker.sock into the job).
on:
push:
branches: [main]
workflow_dispatch:
env:
REGISTRY: registry.snsnetlabs.com
OWNER: sam
jobs:
build:
runs-on: docker
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Docker CLI
# Debian bookworm's docker.io is too old (API 1.41) for the host daemon
# (needs >= 1.44), so install the current docker-ce-cli from Docker's repo.
run: |
apt-get update
apt-get install -y --no-install-recommends ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian bookworm stable" > /etc/apt/sources.list.d/docker.list
apt-get update
apt-get install -y --no-install-recommends docker-ce-cli
docker version
- name: Log in to Forgejo registry
run: |
echo "${{ secrets.FORGEJO_REGISTRY_TOKEN }}" \
| docker login "$REGISTRY" -u "$OWNER" --password-stdin
- name: Build & push frontend image
run: |
docker build \
-t "$REGISTRY/$OWNER/archnest:${{ github.sha }}" \
-t "$REGISTRY/$OWNER/archnest:latest" \
-f Dockerfile .
docker push "$REGISTRY/$OWNER/archnest:${{ github.sha }}"
docker push "$REGISTRY/$OWNER/archnest:latest"
- name: Build & push backend image
run: |
docker build \
-t "$REGISTRY/$OWNER/archnest-backend:${{ github.sha }}" \
-t "$REGISTRY/$OWNER/archnest-backend:latest" \
-f backend/Dockerfile backend
docker push "$REGISTRY/$OWNER/archnest-backend:${{ github.sha }}"
docker push "$REGISTRY/$OWNER/archnest-backend:latest"
- name: Log out
if: always()
run: docker logout "$REGISTRY"
deploy:
# Auto-deploy to racknerd2 after a successful build. Deploys the exact
# images just built (pinned to this commit's SHA). For manual/on-demand
# deploys of an arbitrary tag (e.g. rollback), use the separate
# "Deploy to racknerd2" workflow (deploy.yml).
needs: build
runs-on: docker
env:
DEPLOY_HOST: 100.96.217.250
DEPLOY_DIR: /opt/archnest
steps:
- name: Install SSH client
run: |
apt-get update
apt-get install -y --no-install-recommends openssh-client
- name: Write deploy key
run: |
install -m 700 -d ~/.ssh
printf '%s\n' "${{ secrets.RACKNERD2_SSH_KEY }}" > ~/.ssh/id_deploy
chmod 600 ~/.ssh/id_deploy
- name: Pull this build's images and restart stack
run: |
ssh -i ~/.ssh/id_deploy -o StrictHostKeyChecking=accept-new \
root@"$DEPLOY_HOST" \
"cd $DEPLOY_DIR && ARCHNEST_TAG='${{ github.sha }}' docker compose pull && ARCHNEST_TAG='${{ github.sha }}' docker compose up -d --remove-orphans"
- name: Health check (backend /api/health via mesh)
run: |
ssh -i ~/.ssh/id_deploy -o StrictHostKeyChecking=accept-new \
root@"$DEPLOY_HOST" \
"for i in \$(seq 1 30); do curl -fsS http://$DEPLOY_HOST:8080/api/health && echo OK && exit 0; sleep 2; done; echo 'health check failed'; cd $DEPLOY_DIR && docker compose logs --tail=50; exit 1"

32
.forgejo/workflows/ci.yml Normal file
View file

@ -0,0 +1,32 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 22
- name: Install + type-check + build frontend
run: |
npm ci
npx tsc --noEmit
npm run build
- name: Install + type-check + build backend
working-directory: backend
run: |
npm ci
npx tsc --noEmit
npm run build

View file

@ -0,0 +1,50 @@
name: Deploy to racknerd2
# Manual-only. Pulls the pre-built images from the registry onto racknerd2
# (validation host) over the NetBird mesh and restarts the stack. Build the
# images first with the "Build & Push Images" workflow.
#
# Requirements (see deploy/README.md):
# - Forgejo Actions secret RACKNERD2_SSH_KEY: private key authorized for
# root@racknerd2 (mesh IP 100.96.217.250).
# - racknerd2 already prepared: Docker installed, logged in to the registry,
# and /opt/archnest/{docker-compose.yml,.env} in place.
on:
workflow_dispatch:
inputs:
tag:
description: "Image tag to deploy (commit SHA or 'latest')"
required: true
default: latest
env:
DEPLOY_HOST: 100.96.217.250
DEPLOY_DIR: /opt/archnest
jobs:
deploy:
runs-on: docker
steps:
- name: Install SSH client
run: |
apt-get update
apt-get install -y --no-install-recommends openssh-client
- name: Write deploy key
run: |
install -m 700 -d ~/.ssh
printf '%s\n' "${{ secrets.RACKNERD2_SSH_KEY }}" > ~/.ssh/id_deploy
chmod 600 ~/.ssh/id_deploy
- name: Pull images and restart stack
run: |
ssh -i ~/.ssh/id_deploy -o StrictHostKeyChecking=accept-new \
root@"$DEPLOY_HOST" \
"cd $DEPLOY_DIR && ARCHNEST_TAG='${{ inputs.tag }}' docker compose pull && ARCHNEST_TAG='${{ inputs.tag }}' docker compose up -d --remove-orphans"
- name: Health check (backend /api/health via mesh)
run: |
ssh -i ~/.ssh/id_deploy -o StrictHostKeyChecking=accept-new \
root@"$DEPLOY_HOST" \
"for i in \$(seq 1 30); do curl -fsS http://$DEPLOY_HOST:8080/api/health && echo OK && exit 0; sleep 2; done; echo 'health check failed'; cd $DEPLOY_DIR && docker compose logs --tail=50; exit 1"

View file

@ -1,140 +0,0 @@
name: Deploy to racknerd1
# Deploys ArchNest (frontend + backend + guacd) to racknerd1 via Docker Compose.
#
# Triggers:
# - push to main (automatic)
# - manual run from the Actions tab (workflow_dispatch)
#
# Required GitHub Actions repo secrets (Settings -> Secrets and variables -> Actions):
# RACKNERD_HOST - racknerd1 hostname or IP the runner can SSH to
# RACKNERD_USER - deploy SSH user (must be in the docker group)
# RACKNERD_SSH_KEY - private SSH key (PEM) for that user
# RACKNERD_PORT - SSH port (optional, defaults to 22)
#
# One-time host setup (NOT done by this workflow, see README Deployment section):
# - Docker + Docker Compose installed, deploy user in the docker group
# - mkdir -p /opt/archnest
# - Create /opt/archnest/.env from .env.example with real generated secrets
# (ARCHNEST_JWT_SECRET, ARCHNEST_SECRET_KEY, ARCHNEST_GUAC_CRYPT_KEY, ...).
# This workflow refuses to deploy if that file is missing, and never
# overwrites it, so live secrets/data are safe across deploys.
on:
push:
branches: [main]
workflow_dispatch: {}
# Prevent overlapping deploys clobbering each other.
concurrency:
group: deploy-racknerd1
cancel-in-progress: false
env:
DEPLOY_PATH: /opt/archnest
jobs:
# Fail fast on build/type errors before touching the server.
validate:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 22
- name: Install + type-check + build frontend
run: |
npm ci
npx tsc --noEmit
npm run build
- name: Install + type-check + build backend
working-directory: backend
run: |
npm ci
npx tsc --noEmit
npm run build
deploy:
needs: validate
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Pre-flight - confirm host .env exists (don't deploy without secrets)
uses: appleboy/ssh-action@v1.2.0
with:
host: ${{ secrets.RACKNERD_HOST }}
username: ${{ secrets.RACKNERD_USER }}
key: ${{ secrets.RACKNERD_SSH_KEY }}
port: ${{ secrets.RACKNERD_PORT || 22 }}
script: |
set -e
mkdir -p ${{ env.DEPLOY_PATH }}
if [ ! -f ${{ env.DEPLOY_PATH }}/.env ]; then
echo "::error::Missing ${{ env.DEPLOY_PATH }}/.env on the host."
echo "Create it from .env.example with real secrets before deploying."
echo "It is intentionally never created/overwritten by this workflow."
exit 1
fi
echo ".env present - proceeding."
- name: Copy repo to racknerd1
uses: appleboy/scp-action@v0.1.7
with:
host: ${{ secrets.RACKNERD_HOST }}
username: ${{ secrets.RACKNERD_USER }}
key: ${{ secrets.RACKNERD_SSH_KEY }}
port: ${{ secrets.RACKNERD_PORT || 22 }}
source: "."
target: ${{ env.DEPLOY_PATH }}
# Keep the host-only .env (and any other untracked host state) intact.
rm: false
overwrite: true
- name: Build, restart, and clean up
uses: appleboy/ssh-action@v1.2.0
with:
host: ${{ secrets.RACKNERD_HOST }}
username: ${{ secrets.RACKNERD_USER }}
key: ${{ secrets.RACKNERD_SSH_KEY }}
port: ${{ secrets.RACKNERD_PORT || 22 }}
command_timeout: 20m
script: |
set -e
cd ${{ env.DEPLOY_PATH }}
docker compose up -d --build --remove-orphans
docker image prune -f
- name: Health check (backend /api/health)
uses: appleboy/ssh-action@v1.2.0
with:
host: ${{ secrets.RACKNERD_HOST }}
username: ${{ secrets.RACKNERD_USER }}
key: ${{ secrets.RACKNERD_SSH_KEY }}
port: ${{ secrets.RACKNERD_PORT || 22 }}
script: |
set -e
echo "Waiting for backend to become healthy..."
for i in $(seq 1 30); do
if curl -fsS http://127.0.0.1:4000/api/health >/dev/null 2>&1; then
echo "Backend healthy."
# Confirm the frontend container is serving too.
if curl -fsS http://127.0.0.1:8080/ >/dev/null 2>&1; then
echo "Frontend healthy. Deploy succeeded."
exit 0
fi
echo "Frontend not ready yet..."
fi
sleep 5
done
echo "::error::Health check failed after ~150s. Dumping container status + logs."
cd ${{ env.DEPLOY_PATH }}
docker compose ps || true
docker compose logs --tail=80 || true
exit 1

3
.gitignore vendored
View file

@ -15,6 +15,9 @@ dist-ssr
# Backend data/secrets # Backend data/secrets
backend/data backend/data
backend/.env backend/.env
# Env files (real secrets) — keep only the .example variants
.env
deploy/.env
*.db *.db
*.db-journal *.db-journal
*.db-wal *.db-wal

View file

@ -0,0 +1,14 @@
{
"enabled": true,
"name": "Start Forgejo Tunnel",
"description": "Starts the SSH tunnel to Forgejo (localhost:3000 → 192.168.122.102:3000) when a prompt is submitted, ensuring the Forgejo extension and Git operations work.",
"version": "1",
"when": {
"type": "promptSubmit"
},
"then": {
"type": "runCommand",
"command": "powershell -Command \"if (-not (Test-NetConnection -ComputerName localhost -Port 3000 -InformationLevel Quiet -WarningAction SilentlyContinue)) { Start-Process ssh -ArgumentList '-N','forgejo-tunnel' -WindowStyle Hidden }\"",
"timeout": 10
}
}

View file

@ -0,0 +1,14 @@
{
"enabled": true,
"name": "View ArchNest on racknerd2 (localhost:8080)",
"description": "Opens an SSH local port-forward (localhost:8080 -> racknerd2 8080) so the deployed ArchNest site can be viewed in a browser at http://localhost:8080. RackNerd's edge only allows port 22, so this tunnels the web app over SSH. Trigger it to start the tunnel; stop the hook's process to close it.",
"version": "1",
"when": {
"type": "userTriggered"
},
"then": {
"type": "runCommand",
"command": "ssh -o BatchMode=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=30 -o ServerAliveCountMax=3 -L 8080:localhost:8080 -N racknerd2",
"timeout": 0
}
}

10
.kiro/settings/mcp.json Normal file
View file

@ -0,0 +1,10 @@
{
"mcpServers": {
"context7": {
"command": "npx",
"args": ["-y", "@upstash/context7-mcp@latest"],
"disabled": false,
"autoApprove": []
}
}
}

View file

@ -0,0 +1 @@
{"specId": "044511cc-0b54-456f-9bbb-5769c4f7c380", "workflowType": "fast-task", "specType": "feature"}

View file

@ -0,0 +1,561 @@
# Design Document: Code Audit Fixes
## Overview
This design addresses 13 Critical + High severity issues from the ArchNest code audit plus a CloudFormation test deploy template. The fixes are surgical — each targets a specific file with a minimal, correct patch. No new dependencies are introduced; all fixes use Node.js built-ins, existing Fastify hooks, and standard React patterns already in the codebase.
## Architecture
The fixes span three layers:
```
┌─────────────────────────────────────────────────────────┐
│ Frontend (React 19 + Vite 8 + TypeScript) │
│ ├─ TerminalSessionContext.tsx — WS lifecycle, auth │
│ ├─ Sidebar.tsx — promise error catch │
│ ├─ App.tsx — ErrorBoundary wrapper │
│ └─ ErrorBoundary.tsx — new component │
├─────────────────────────────────────────────────────────┤
│ Backend (Fastify 5 + TypeScript) │
│ ├─ routes/terminal.ts — tmux validation, WS auth, │
│ │ session log ID validation │
│ ├─ routes/files.ts — path traversal prevention │
│ ├─ routes/agents.ts — timing-safe token compare │
│ ├─ routes/data.ts — body size limit │
│ ├─ routes/docker.ts — JSON parse error handling │
│ ├─ ssh/docker.ts — container ref validation, │
│ │ SSH connection cleanup │
│ └─ server.ts — CORS fail-closed default │
├─────────────────────────────────────────────────────────┤
│ Infrastructure │
│ └─ infra/test-deploy.yml — CloudFormation template │
└─────────────────────────────────────────────────────────┘
```
## Components and Interfaces
### Component 1: WebSocket Session Leak Prevention (Frontend)
**File:** `src/lib/TerminalSessionContext.tsx`
**Current problem:** The `connect()` function calls `s.ws?.close()` but a race condition exists where the old WS `onclose` handler fires after reassignment and corrupts state belonging to the new connection.
**Fix:**
```typescript
function connect(s: PaneSession, id: number, tmuxSession?: string) {
s.disposeListeners?.()
// Close existing WS if OPEN or CONNECTING — guard against leak
if (s.ws && (s.ws.readyState === WebSocket.OPEN || s.ws.readyState === WebSocket.CONNECTING)) {
s.ws.close()
}
s.ws = null
s.connected = false
bump()
const term = s.term
term.reset()
term.writeln('Connecting…')
const token = getToken()
const proto = window.location.protocol === 'https:' ? 'wss' : 'ws'
// Token sent as first message, NOT in URL query string
const ws = new WebSocket(`${proto}://${window.location.host}/api/terminal`)
const thisWs = ws // Capture reference to detect stale onclose
s.ws = ws
ws.onopen = () => {
// Send auth as first message
ws.send(JSON.stringify({ type: 'auth', token }))
ws.send(JSON.stringify({ type: 'connect', integrationId: id, cols: term.cols, rows: term.rows, tmuxSession }))
}
ws.onclose = () => {
// Guard: only update state if this WS is still the active one
if (s.ws !== thisWs) return
s.connected = false
bump()
}
// ... rest of message handling unchanged
}
```
The same pattern applies to `fetchTmuxSessions()` — remove `?token=` from URL, send auth as first message.
### Component 2: tmux Session Name Validation (Backend)
**File:** `backend/src/routes/terminal.ts`
**Current state:** The regex `TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/` is already defined and used. The issue is that the validated name is interpolated directly into a `tmux attach -t ${tmuxSession}` command without shell quoting.
**Fix:** Apply `shQuote()` to the validated name in command construction, or use the existing pattern where invalid names fall through to `null`:
```typescript
const TMUX_NAME_RE = /^[A-Za-z0-9_-]{1,64}$/
// In the connect handler:
const tmuxSession = msg.tmuxSession && TMUX_NAME_RE.test(msg.tmuxSession)
? msg.tmuxSession
: null
if (tmuxSession) {
// Name is validated to contain only safe chars; quote for defense-in-depth
const safe = tmuxSession.replace(/'/g, "") // Impossible given regex, but belt+suspenders
client.exec(`tmux attach -t '${safe}' || tmux new-session -s '${safe}'`, {
pty: { cols, rows, term: 'xterm-256color' },
}, onChannel)
}
```
### Component 3: Docker Container Reference Validation (Backend)
**File:** `backend/src/ssh/docker.ts`
**Current state:** Already has `CONTAINER_REF_RE` and `shQuote()`. The regex and quoting are correctly implemented. Verify the regex anchors and character set are tight:
```typescript
const CONTAINER_REF_RE = /^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$/
```
This is already correct. The `shQuote()` function is already applied in `containerLogs`, `containerAction`, `removeContainer`, and `buildExecShellCommand`. No code change needed — the audit finding is already resolved in the current codebase.
### Component 4: Sidebar Promise Error Handling (Frontend)
**File:** `src/components/Sidebar.tsx`
**Fix:** Add `.catch()` to the unhandled promise:
```typescript
useEffect(() => {
api.listIntegrations()
.then(({ integrations }) => setIntegrations(integrations))
.catch(() => {}) // Leave state as null → shows "Checking…"
}, [])
```
### Component 5: WebSocket Authentication via First Message (Backend)
**Files:** `backend/src/routes/terminal.ts`, `backend/src/routes/docker.ts`
**Current state:** Both routes verify `req.query.token` on `connect`/`list_tmux` messages. This needs to change to a first-message auth protocol.
**Design:**
```typescript
// Shared auth state per WSocket connection
let authenticated = false
socket.on('message', async (raw: Buffer) => {
let msg: ClientMessage
try {
msg = JSON.parse(raw.toString())
} catch {
send(socket, { type: 'error', message: 'Invalid JSON' })
return
}
// Gate: first message must be auth
if (!authenticated) {
if (msg.type !== 'auth' || !msg.token) {
send(socket, { type: 'error', message: 'Authentication required' })
socket.close()
return
}
try {
await app.jwt.verify(msg.token)
authenticated = true
send(socket, { type: 'authenticated' })
} catch {
send(socket, { type: 'error', message: 'Unauthorized' })
socket.close()
}
return
}
// Normal message processing (connect, input, resize, etc.)
// ...
})
```
The `ClientMessage` type gets a new variant: `type: 'auth'` with `token: string`.
### Component 6: File Path Traversal Prevention (Backend)
**File:** `backend/src/routes/files.ts`
**Design:** Add a validation function applied to all path inputs before any SFTP operation:
```typescript
import { posix } from 'node:path'
function validatePath(path: string): string {
// Reject absolute paths
if (path.startsWith('/')) {
throw Object.assign(new Error('Absolute paths are not allowed'), { statusCode: 400 })
}
// Normalize and check for traversal
const normalized = posix.normalize(path)
if (normalized.startsWith('../') || normalized === '..' || normalized.includes('/../')) {
throw Object.assign(new Error('Path traversal is not allowed'), { statusCode: 400 })
}
return normalized
}
```
Applied at the top of every endpoint handler that accepts a user path: `list`, `content` (read), `content` (write), `mkdir`, `rename`, `delete`, `chmod`, `download`, `upload`.
### Component 7: Agent Token Timing-Safe Comparison (Backend)
**File:** `backend/src/routes/agents.ts`
**Current state:** The `agentTokenValid()` function already uses `timingSafeEqual` but has an early return on length mismatch that leaks timing information.
**Fix:** Pad to equal length before comparison:
```typescript
function agentTokenValid(req: FastifyRequest): { ok: boolean; configured: boolean } {
const expected = process.env.ARCHNEST_AGENT_TOKEN
if (!expected) return { ok: false, configured: false }
const header = req.headers.authorization ?? ''
const presented = header.startsWith('Bearer ') ? header.slice(7) : ''
const a = Buffer.from(presented)
const b = Buffer.from(expected)
// Pad shorter buffer to match longer — constant-time regardless of length diff
const maxLen = Math.max(a.length, b.length)
const aPadded = Buffer.alloc(maxLen)
const bPadded = Buffer.alloc(maxLen)
a.copy(aPadded)
b.copy(bPadded)
const match = timingSafeEqual(aPadded, bPadded)
// Both length AND content must match
return { ok: match && a.length === b.length, configured: true }
}
```
### Component 8: Data Import Size Limit (Backend)
**File:** `backend/src/routes/data.ts`
**Fix:** Add `bodyLimit` to the route registration:
```typescript
app.post('/api/data/import', {
onRequest: [app.adminOnly],
bodyLimit: 10 * 1024 * 1024, // 10 MB
}, async (req, reply) => {
// ... existing handler
})
```
Fastify natively returns 413 Payload Too Large when exceeded, before JSON parsing.
### Component 9: SSH Connection Leak Prevention (Backend)
**File:** `backend/src/ssh/docker.ts`
**Current state:** The `withSshClient()` function already has a `finally` block that calls `client.end()` and `jumpRef.current?.end()`. This is correctly implemented in the current codebase. The audit noted older code; the fix is already present.
Verify the `finally` block:
```typescript
} finally {
client.end()
jumpRef.current?.end()
}
```
This is already in place. No change needed.
### Component 10: CORS Origin Fail-Closed Default (Backend)
**File:** `backend/src/server.ts`
**Current state:** `origin: process.env.ARCHNEST_CORS_ORIGIN ?? true` — falls back to `true` (allow all).
**Fix:**
```typescript
const corsOrigin = process.env.ARCHNEST_CORS_ORIGIN || false
if (!process.env.ARCHNEST_CORS_ORIGIN) {
app.log.warn('ARCHNEST_CORS_ORIGIN is not set — all cross-origin requests will be blocked')
}
await app.register(cors, { origin: corsOrigin })
```
### Component 11: React Error Boundary (Frontend)
**File:** `src/components/ErrorBoundary.tsx` (new)
```typescript
import { Component, type ReactNode, type ErrorInfo } from 'react'
interface Props { children: ReactNode }
interface State { hasError: boolean }
export class ErrorBoundary extends Component<Props, State> {
state: State = { hasError: false }
static getDerivedStateFromError(): State {
return { hasError: true }
}
componentDidCatch(error: Error, info: ErrorInfo) {
console.error('[ErrorBoundary]', error, info.componentStack)
}
render() {
if (this.state.hasError) {
return (
<div style={{
display: 'flex', flexDirection: 'column', alignItems: 'center',
justifyContent: 'center', height: '100vh', backgroundColor: '#0D0E10',
color: '#E8E6E0', fontFamily: 'system-ui, sans-serif',
}}>
<p style={{ fontSize: '16px', marginBottom: '16px' }}>
Something went wrong.
</p>
<button
onClick={() => window.location.reload()}
style={{
padding: '8px 20px', backgroundColor: '#C8A434',
color: '#0D0E10', border: 'none', borderRadius: '6px',
cursor: 'pointer', fontWeight: 600,
}}
>
Reload Page
</button>
</div>
)
}
return this.props.children
}
}
```
**Integration in `App.tsx`:**
```typescript
import { ErrorBoundary } from './components/ErrorBoundary'
// In App():
return <ErrorBoundary><Dashboard /></ErrorBoundary>
```
### Component 12: WebSocket JSON Parse Error Handling (Backend)
**File:** `backend/src/routes/docker.ts`
**Current state:** The `dockerExecRoutes` handler already has a try/catch around `JSON.parse` that sends `{ type: 'error', message: 'Invalid JSON' }` and does NOT close the connection. This is already correctly implemented.
Verify the existing code:
```typescript
socket.on('message', async (raw: Buffer) => {
let msg: ExecMessage
try {
msg = JSON.parse(raw.toString())
} catch {
sendJson(socket, { type: 'error', message: 'Invalid JSON' })
return // Does not close — client can retry
}
// ...
})
```
Already correct. No change needed.
### Component 13: Session Log Path Traversal Prevention (Backend)
**File:** `backend/src/routes/terminal.ts`
**Fix:** Validate `integrationId` is a positive integer before constructing the log path:
```typescript
function sessionLogPath(integrationId: number): string | null {
// Validate: must be a positive integer (no NaN, no negative, no float)
if (!Number.isInteger(integrationId) || integrationId <= 0) return null
mkdirSync(SESSION_LOG_DIR, { recursive: true })
const stamp = new Date().toISOString().replace(/[:.]/g, '-')
return join(SESSION_LOG_DIR, `${integrationId}_${stamp}.log`)
}
```
Callers check for `null` return and skip logging.
### Component 14: CloudFormation Test Deploy Template
**File:** `infra/test-deploy.yml`
```yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: >-
ArchNest test deployment - t4g.small EC2 with Docker in us-east-1.
Budget alarm at $30/month. Destroy after testing.
Parameters:
KeyPairName:
Type: AWS::EC2::KeyPair::KeyName
Description: SSH key pair for instance access
NotificationEmail:
Type: String
Description: Email for budget alarm notifications
Resources:
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ArchNest test - SSH + HTTP/HTTPS
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t4g.small
KeyName: !Ref KeyPairName
ImageId: !Sub '{{resolve:ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64}}'
SecurityGroupIds:
- !GetAtt SecurityGroup.GroupId
UserData:
Fn::Base64: |
#!/bin/bash -xe
dnf update -y
dnf install -y docker git
systemctl enable docker && systemctl start docker
usermod -aG docker ec2-user
# Install Docker Compose v2 plugin
mkdir -p /usr/local/lib/docker/cli-plugins
curl -fsSL "https://github.com/docker/compose/releases/latest/download/docker-compose-linux-$(uname -m)" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
BudgetAlarm:
Type: AWS::Budgets::Budget
Properties:
Budget:
BudgetName: archnest-test-budget
BudgetLimit:
Amount: 30
Unit: USD
TimeUnit: MONTHLY
BudgetType: COST
NotificationsWithSubscribers:
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 80
Subscribers:
- SubscriptionType: EMAIL
Address: !Ref NotificationEmail
Outputs:
PublicIP:
Value: !GetAtt Instance.PublicIp
Description: Instance public IP for SSH access
InstanceId:
Value: !Ref Instance
Description: EC2 instance ID
```
## Data Models
No new database tables or schema changes are required. All fixes operate on existing data structures.
**New TypeScript interface (WebSocket auth message):**
```typescript
interface AuthMessage {
type: 'auth'
token: string
}
// ClientMessage union extended:
type ClientMessage =
| AuthMessage
| { type: 'connect'; integrationId?: number; cols?: number; rows?: number; tmuxSession?: string }
| { type: 'input'; data?: string }
| { type: 'resize'; cols?: number; rows?: number }
| { type: 'disconnect' }
| { type: 'list_tmux'; integrationId?: number }
```
## Error Handling
| Component | Error Case | Behavior |
|-----------|-----------|----------|
| Path validation | Traversal/absolute path | HTTP 400, descriptive message |
| WS auth gate | Missing/invalid token | Error frame + close connection |
| Agent token | Wrong token (any form) | HTTP 401, identical response |
| Data import | Body > 10MB | HTTP 413 (Fastify built-in) |
| JSON parse | Malformed WS message | Error frame, keep connection open |
| Session log | Invalid integrationId | Skip logging, no error to user |
| Error Boundary | Component crash | Fallback UI with reload button |
| Sidebar | API rejection | Swallow error, show "Checking…" |
## Testing Strategy
**Unit tests** (example-based): WebSocket session lifecycle (Req 1), Sidebar error catch (Req 4), frontend token removal from URL (Req 5.15.2), Error Boundary rendering (Req 11), CORS default (Req 10), body size limit (Req 8).
**Property tests** (100+ iterations): Input validation functions (tmux names, container refs, path traversal, integration IDs), token comparison correctness, WS auth gate, SSH cleanup guarantee, JSON parse resilience.
**Smoke tests**: CloudFormation template structure validation (Req 14) — verify resource types, parameters, and outputs exist with correct values.
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: tmux Session Name Validation Prevents Injection
*For any* string `s`, if `TMUX_NAME_RE.test(s)` returns true, then `s` contains only characters from `[A-Za-z0-9_-]` and has length 164, and the resulting shell command `tmux attach -t '${s}'` is safe from injection. *For any* string `s` that contains characters outside this set or exceeds 64 chars, the validator SHALL reject it.
**Validates: Requirements 2.1, 2.3**
### Property 2: Container Reference Validation and Safe Escaping
*For any* string `ref`, `isValidContainerRef(ref)` returns true if and only if `ref` matches `^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$`. *For any* valid container reference, `shQuote(ref)` produces a string that, when embedded in a shell command, cannot break out of single quotes.
**Validates: Requirements 3.1, 3.3**
### Property 3: Path Traversal Prevention
*For any* path string `p`, if `p` starts with `/` or if `posix.normalize(p)` contains a `..` component that would escape the root (starts with `../`, equals `..`, or contains `/../`), then `validatePath(p)` SHALL throw an error. *For any* relative path without traversal components, `validatePath(p)` SHALL return the normalized path.
**Validates: Requirements 6.1, 6.2**
### Property 4: Agent Token Comparison Correctness
*For any* pair of token strings `(presented, expected)`, the `agentTokenValid` function returns `ok: true` if and only if `presented === expected`. The function SHALL never throw regardless of input lengths, and SHALL return the same HTTP 401 response shape for all rejection cases (length mismatch or content mismatch).
**Validates: Requirements 7.1, 7.2, 7.3**
### Property 5: WebSocket Auth Gate Rejects Unauthenticated Messages
*For any* WebSocket message received before a valid `auth` message has been processed, the backend SHALL respond with an error frame and close the connection. No `connect`, `list_tmux`, `input`, or `resize` message SHALL be processed in the unauthenticated state.
**Validates: Requirements 5.3, 5.4**
### Property 6: SSH Connection Cleanup Guarantee
*For any* operation function `fn` passed to `withSshClient`, regardless of whether `fn` resolves or rejects, both the primary SSH client and the jump-host client (if any) SHALL have `.end()` called before the `withSshClient` promise settles.
**Validates: Requirements 9.1, 9.2**
### Property 7: WebSocket Invalid JSON Resilience
*For any* byte sequence sent to the Docker exec WebSocket that is not valid JSON, the handler SHALL send `{ type: 'error', message: 'Invalid JSON' }` and SHALL NOT close the WebSocket connection.
**Validates: Requirements 12.1, 12.2**
### Property 8: Integration ID Numeric Validation for Session Logs
*For any* value `v` used as `integrationId` when session logging is enabled, if `v` is not a positive integer (`Number.isInteger(v) && v > 0`), then `sessionLogPath` SHALL return `null` and no filesystem write SHALL occur.
**Validates: Requirements 13.1, 13.2**

View file

@ -0,0 +1,164 @@
# Requirements Document
## Introduction
This feature addresses 13 Critical and High severity issues identified during a code audit of the ArchNest self-hosted ops dashboard. The fixes target security vulnerabilities (injection, traversal, timing attacks, token exposure), resource leaks (WebSocket, SSH connections), and stability gaps (missing error boundaries, unhandled exceptions). A CloudFormation template for test deployment is also included to verify fixes in an isolated environment.
## Glossary
- **Backend**: The Fastify 5 + TypeScript server application in `backend/src/`
- **Frontend**: The React 19 + Vite 8 + TypeScript client application in `src/`
- **WebSocket_Session**: A browser-to-server WebSocket connection used for terminal, Docker exec, or tmux-list operations
- **Terminal_Route**: The backend WebSocket endpoint at `/api/terminal` handling SSH terminal sessions
- **Docker_SSH_Module**: The `backend/src/ssh/docker.ts` module that runs Docker CLI commands over SSH
- **Files_Route**: The backend REST endpoint group at `/api/files/:integrationId/*` for SFTP file operations
- **Agents_Route**: The backend endpoint at `/api/agents/docker/report` for agent token-gated ingest
- **Data_Route**: The backend endpoint group at `/api/data/import` and `/api/data/export` for backup/restore
- **Docker_Exec_Route**: The backend WebSocket endpoint at `/api/docker/exec` for Docker container exec sessions
- **Error_Boundary**: A React class component that catches JavaScript errors in its child component tree and renders a fallback UI
- **CloudFormation_Template**: An AWS CloudFormation YAML file that provisions test infrastructure (EC2, security group, budget alarm)
- **Container_Ref**: A Docker container name or ID string validated before interpolation into shell commands
- **Integration_ID**: A numeric identifier for an SSH/Docker integration stored in the SQLite database
## Requirements
### Requirement 1: WebSocket Session Leak Prevention
**User Story:** As an operator, I want terminal reconnections to properly close prior WebSocket connections, so that dangling sessions do not accumulate and exhaust server resources.
#### Acceptance Criteria
1. WHEN a new terminal WebSocket connection is initiated for a pane, THE Frontend SHALL close the existing WebSocket (if any) before creating a new WebSocket instance.
2. WHEN the existing WebSocket readyState is OPEN or CONNECTING, THE Frontend SHALL call `ws.close()` on the existing reference prior to reassignment.
3. IF the WebSocket `onclose` event fires after a new connection has replaced the reference, THEN THE Frontend SHALL not modify state belonging to the new connection.
### Requirement 2: tmux Session Name Injection Prevention
**User Story:** As a security engineer, I want tmux session names to be strictly validated, so that shell metacharacter injection through the terminal WebSocket is impossible.
#### Acceptance Criteria
1. THE Terminal_Route SHALL validate tmux session names against the pattern `^[A-Za-z0-9_-]{1,64}$`.
2. WHEN a `connect` message includes a `tmuxSession` value that does not match the allowed pattern, THE Terminal_Route SHALL treat it as null and open a plain shell instead.
3. WHEN a valid tmux session name is used in a shell command, THE Terminal_Route SHALL pass it only within a validated context where no additional characters can be appended by the client.
### Requirement 3: Docker Container Reference Validation
**User Story:** As a security engineer, I want container name/ID references to be tightly validated, so that shell command injection through crafted container identifiers is prevented.
#### Acceptance Criteria
1. THE Docker_SSH_Module SHALL validate container references against the pattern `^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$`.
2. WHEN a container reference fails validation, THE Docker_SSH_Module SHALL throw an error before any shell command is constructed.
3. THE Docker_SSH_Module SHALL pass validated container references through single-quote shell escaping before interpolation into commands.
### Requirement 4: Sidebar Promise Error Handling
**User Story:** As a user, I want the sidebar to handle API failures gracefully, so that an unhandled promise rejection does not crash the application or produce console errors.
#### Acceptance Criteria
1. WHEN the `api.listIntegrations()` call in the Sidebar component rejects, THE Frontend SHALL catch the error and leave the integrations state as null.
2. IF the integrations API call fails, THEN THE Frontend SHALL display the "Checking…" status label rather than an error state.
### Requirement 5: WebSocket Authentication via First Message
**User Story:** As a security engineer, I want JWT tokens removed from WebSocket URL query strings, so that tokens are not logged in server access logs or proxy logs.
#### Acceptance Criteria
1. THE Frontend SHALL not include the JWT token as a URL query parameter when opening terminal or Docker exec WebSocket connections.
2. WHEN a terminal WebSocket connection opens, THE Frontend SHALL send an authentication message containing the JWT token as the first message before any other message type.
3. WHEN the Backend receives a WebSocket connection on the terminal or Docker exec endpoints, THE Backend SHALL require a valid JWT token in the first message before processing `connect` or `list_tmux` messages.
4. IF the first message does not contain a valid JWT token, THEN THE Backend SHALL send an error frame and close the WebSocket connection.
### Requirement 6: File Path Traversal Prevention
**User Story:** As a security engineer, I want file operation paths to be validated against directory traversal, so that attackers cannot access files outside the intended directory scope.
#### Acceptance Criteria
1. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject paths containing `../` sequences after normalization.
2. WHEN a path parameter is received by the Files_Route, THE Files_Route SHALL reject absolute paths (paths starting with `/`).
3. IF a path fails traversal validation, THEN THE Files_Route SHALL return HTTP 400 with a descriptive error message.
4. THE Files_Route SHALL apply path validation to all endpoints that accept a user-supplied path: list, content read, content write, mkdir, rename, delete, chmod, download, and upload.
### Requirement 7: Agent Token Timing-Safe Comparison
**User Story:** As a security engineer, I want agent token comparison to use constant-time equality, so that timing side-channel attacks cannot be used to guess the token byte-by-byte.
#### Acceptance Criteria
1. THE Agents_Route SHALL compare presented tokens using `crypto.timingSafeEqual()`.
2. WHEN the presented token length differs from the expected token length, THE Agents_Route SHALL pad both buffers to equal length before performing the constant-time comparison.
3. THE Agents_Route SHALL return the same HTTP response (401 Unauthorized) regardless of whether the token length or content mismatched.
### Requirement 8: Data Import Size Limit
**User Story:** As an operator, I want data import requests to have a body size limit, so that an admin cannot accidentally or maliciously cause a denial-of-service by importing an extremely large JSON payload.
#### Acceptance Criteria
1. THE Data_Route import endpoint SHALL enforce a maximum request body size of 10 MB.
2. IF a request body exceeds 10 MB, THEN THE Data_Route SHALL reject the request with HTTP 413 before parsing the JSON payload.
### Requirement 9: SSH Connection Leak Prevention
**User Story:** As an operator, I want SSH connections used for Docker-over-SSH operations to be closed reliably, so that connection leaks do not exhaust SSH connection limits on remote hosts.
#### Acceptance Criteria
1. THE Docker_SSH_Module `withSshClient` function SHALL close both the primary SSH client and any jump-host client in a `finally` block after the operation completes.
2. IF the operation function throws an error, THEN THE Docker_SSH_Module SHALL still close both SSH connections before returning the error result.
### Requirement 10: CORS Origin Fail-Closed Default
**User Story:** As a security engineer, I want CORS to reject all cross-origin requests when no explicit origin is configured, so that a misconfigured deployment does not silently allow any origin.
#### Acceptance Criteria
1. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is not set, THE Backend SHALL default CORS origin to `false` (reject all cross-origin requests).
2. WHEN the `ARCHNEST_CORS_ORIGIN` environment variable is set to a valid origin string, THE Backend SHALL use that value as the allowed CORS origin.
3. THE Backend SHALL log a warning at startup when `ARCHNEST_CORS_ORIGIN` is not configured, indicating that cross-origin requests are blocked.
### Requirement 11: React Error Boundary
**User Story:** As a user, I want the application to catch rendering errors gracefully, so that a single component crash does not take down the entire dashboard.
#### Acceptance Criteria
1. THE Frontend SHALL wrap the Dashboard component tree in an Error_Boundary component.
2. WHEN a child component throws during rendering, THE Error_Boundary SHALL catch the error and render a fallback UI instead of a blank screen.
3. THE Error_Boundary fallback UI SHALL display a message indicating an error occurred and provide a way to reload the page.
4. THE Error_Boundary SHALL log the caught error to the browser console for debugging.
### Requirement 12: WebSocket JSON Parse Error Handling
**User Story:** As an operator, I want malformed WebSocket messages to be handled gracefully, so that invalid JSON from a client does not crash the connection handler.
#### Acceptance Criteria
1. WHEN the Docker_Exec_Route receives a WebSocket message that is not valid JSON, THE Docker_Exec_Route SHALL send an error frame with message "Invalid JSON" to the client.
2. WHEN the Docker_Exec_Route receives invalid JSON, THE Docker_Exec_Route SHALL not close the WebSocket connection (allowing the client to retry).
### Requirement 13: Session Log Path Traversal Prevention
**User Story:** As a security engineer, I want the integrationId used in session log file paths to be strictly validated as numeric, so that path traversal through crafted identifiers is impossible.
#### Acceptance Criteria
1. WHEN session logging is enabled, THE Terminal_Route SHALL validate that the integrationId is a positive integer before constructing the log file path.
2. IF the integrationId is not a valid positive integer, THEN THE Terminal_Route SHALL skip session logging for that connection rather than writing to an unvalidated path.
### Requirement 14: CloudFormation Test Deploy Template
**User Story:** As a developer, I want a CloudFormation template that provisions a t4g.small EC2 instance with Docker in us-east-1, so that I can verify all fixes in an isolated environment within a $30/month budget.
#### Acceptance Criteria
1. THE CloudFormation_Template SHALL provision a t4g.small EC2 instance in us-east-1 running Amazon Linux 2023.
2. THE CloudFormation_Template SHALL create a security group allowing inbound SSH (port 22) and HTTP/HTTPS (ports 80, 443) from any source.
3. THE CloudFormation_Template SHALL install Docker and Docker Compose on the EC2 instance via UserData.
4. THE CloudFormation_Template SHALL create an AWS Budget alarm at $30/month threshold with email notification.
5. THE CloudFormation_Template SHALL output the instance public IP and instance ID for SSH access.
6. THE CloudFormation_Template SHALL accept parameters for the SSH key pair name and notification email address.

View file

@ -0,0 +1,153 @@
# Implementation Plan: Code Audit Fixes
## Overview
Surgical fixes for 13 Critical + High audit issues across the ArchNest backend (Fastify 5 + TypeScript) and frontend (React 19 + TypeScript), plus a CloudFormation test deploy template. Each task targets a specific file with a minimal, correct patch. No new dependencies introduced.
## Tasks
- [ ] 1. Backend security hardening — input validation and auth
- [ ] 1.1 Add path traversal prevention to `backend/src/routes/files.ts`
- Add `validatePath()` function using `posix.normalize()` to reject absolute paths and `..` traversal
- Apply validation at the top of every handler that accepts a user-supplied path (list, content read, content write, mkdir, rename, delete, chmod, download, upload)
- Return HTTP 400 with descriptive error on rejection
- _Requirements: 6.1, 6.2, 6.3, 6.4_
- [ ] 1.2 Fix agent token timing-safe comparison in `backend/src/routes/agents.ts`
- Replace early-return on length mismatch with padded constant-time comparison
- Pad both buffers to `Math.max(a.length, b.length)` before `timingSafeEqual`
- Ensure `ok: true` only when both length AND content match
- Same 401 response for all rejection cases
- _Requirements: 7.1, 7.2, 7.3_
- [ ] 1.3 Add session log integrationId validation in `backend/src/routes/terminal.ts`
- Update `sessionLogPath()` to return `null` when integrationId is not a positive integer
- Callers skip logging on `null` return
- _Requirements: 13.1, 13.2_
- [ ] 1.4 Add body size limit to data import in `backend/src/routes/data.ts`
- Add `bodyLimit: 10 * 1024 * 1024` to the POST `/api/data/import` route registration
- Fastify returns 413 automatically when exceeded
- _Requirements: 8.1, 8.2_
- [ ] 1.5 Set CORS origin to fail-closed default in `backend/src/server.ts`
- Change fallback from `true` to `false` when `ARCHNEST_CORS_ORIGIN` is not set
- Log a warning at startup when env var is missing
- _Requirements: 10.1, 10.2, 10.3_
- [ ]* 1.6 Write property tests for path traversal, agent token, and integrationId validation
- **Property 3: Path Traversal Prevention** — verify `validatePath` rejects all `..` escape patterns and accepts valid relative paths
- **Property 4: Agent Token Comparison Correctness** — verify `ok: true` iff `presented === expected`, never throws
- **Property 8: Integration ID Numeric Validation** — verify `sessionLogPath` returns null for non-positive-integer values
- **Validates: Requirements 6.1, 6.2, 7.1, 7.2, 7.3, 13.1, 13.2**
- [ ] 2. Checkpoint — Backend security
- Ensure all tests pass, ask the user if questions arise.
- [ ] 3. WebSocket authentication refactor
- [ ] 3.1 Implement first-message auth gate in `backend/src/routes/terminal.ts`
- Add `authenticated` flag per connection
- Require `{ type: 'auth', token }` as first message; verify JWT
- Reject all other message types before auth with error frame + close
- Remove `req.query.token` usage from connect/list_tmux handlers
- _Requirements: 5.3, 5.4_
- [ ] 3.2 Implement first-message auth gate in `backend/src/routes/docker.ts`
- Same pattern as terminal: auth-first protocol for Docker exec WebSocket
- Verify existing JSON parse error handling remains intact (already correct per design)
- _Requirements: 5.3, 5.4, 12.1, 12.2_
- [ ] 3.3 Update frontend WebSocket connections in `src/lib/TerminalSessionContext.tsx`
- Remove `?token=` from WebSocket URL query string
- Send `{ type: 'auth', token }` as first message on `ws.onopen`
- Apply same change to `fetchTmuxSessions()` WebSocket
- _Requirements: 5.1, 5.2_
- [ ] 3.4 Fix WebSocket session leak in `src/lib/TerminalSessionContext.tsx`
- Guard `ws.close()` with readyState check (OPEN or CONNECTING)
- Capture `thisWs` reference; in `onclose`, bail if `s.ws !== thisWs`
- _Requirements: 1.1, 1.2, 1.3_
- [ ]* 3.5 Write property test for WebSocket auth gate
- **Property 5: WebSocket Auth Gate Rejects Unauthenticated Messages**
- Verify no message type other than `auth` is processed before authentication
- **Validates: Requirements 5.3, 5.4**
- [ ] 4. Checkpoint — WebSocket auth
- Ensure all tests pass, ask the user if questions arise.
- [ ] 5. Backend — tmux validation and SSH cleanup verification
- [ ] 5.1 Harden tmux session name usage in `backend/src/routes/terminal.ts`
- Ensure validated name is single-quoted in shell command construction
- Defense-in-depth: strip any `'` (impossible given regex, but belt+suspenders)
- _Requirements: 2.1, 2.2, 2.3_
- [ ] 5.2 Verify container ref validation in `backend/src/ssh/docker.ts`
- Confirm `CONTAINER_REF_RE` regex is `^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$`
- Confirm `shQuote()` is applied to all container ref interpolations
- Confirm SSH cleanup in `withSshClient` finally block is present
- If already correct (per design), add a code comment noting audit verification
- _Requirements: 3.1, 3.2, 3.3, 9.1, 9.2_
- [ ]* 5.3 Write property tests for tmux name validation and container ref validation
- **Property 1: tmux Session Name Validation Prevents Injection** — verify only `[A-Za-z0-9_-]{1,64}` passes
- **Property 2: Container Reference Validation and Safe Escaping** — verify regex and `shQuote` safety
- **Property 6: SSH Connection Cleanup Guarantee** — verify `withSshClient` always calls `.end()`
- **Validates: Requirements 2.1, 2.3, 3.1, 3.3, 9.1, 9.2**
- [ ] 6. Frontend stability fixes
- [ ] 6.1 Add `.catch()` to Sidebar promise in `src/components/Sidebar.tsx`
- Append `.catch(() => {})` to the `api.listIntegrations()` call
- Ensure integrations state remains null on failure (shows "Checking…")
- _Requirements: 4.1, 4.2_
- [ ] 6.2 Create Error Boundary component at `src/components/ErrorBoundary.tsx`
- React class component with `getDerivedStateFromError` + `componentDidCatch`
- Fallback UI: centered message + gold "Reload Page" button on dark background
- Log error to console
- _Requirements: 11.2, 11.3, 11.4_
- [ ] 6.3 Wrap Dashboard in ErrorBoundary in `src/App.tsx`
- Import and wrap the top-level Dashboard component tree
- _Requirements: 11.1_
- [ ]* 6.4 Write unit tests for ErrorBoundary and Sidebar error handling
- Verify ErrorBoundary renders fallback on child throw
- Verify Sidebar swallows rejected promise without crashing
- **Validates: Requirements 4.1, 4.2, 11.1, 11.2, 11.3, 11.4**
- [ ] 7. CloudFormation test deploy template
- [ ] 7.1 Create `infra/test-deploy.yml` CloudFormation template
- Parameters: KeyPairName (KeyPair type), NotificationEmail (String)
- Resources: SecurityGroup (SSH + HTTP/HTTPS), EC2 Instance (t4g.small, AL2023 ARM64, Docker + Compose via UserData), Budget alarm ($30/month, 80% threshold)
- Outputs: PublicIP, InstanceId
- _Requirements: 14.1, 14.2, 14.3, 14.4, 14.5, 14.6_
- [ ]* 7.2 Write smoke test validating CloudFormation template structure
- Verify required resource types, parameters, and outputs exist
- Validate YAML parses correctly
- **Validates: Requirements 14.114.6**
- [ ] 8. Final checkpoint
- Ensure all tests pass, ask the user if questions arise.
## Notes
- Tasks marked with `*` are optional and can be skipped for faster MVP
- Each task references specific requirements for traceability
- Components 3 (container ref), 9 (SSH cleanup), and 12 (JSON parse) are verified-already-correct per design — task 5.2 confirms with a code comment
- The project uses TypeScript throughout (Fastify 5 backend, React 19 frontend)
- No new dependencies are introduced; all fixes use Node.js built-ins and existing patterns
## Task Dependency Graph
```json
{
"waves": [
{ "id": 0, "tasks": ["1.1", "1.2", "1.3", "1.4", "1.5", "6.1", "6.2", "7.1"] },
{ "id": 1, "tasks": ["1.6", "6.3", "6.4", "7.2"] },
{ "id": 2, "tasks": ["3.1", "3.2", "5.1", "5.2"] },
{ "id": 3, "tasks": ["3.3", "3.4", "3.5", "5.3"] }
]
}
```

View file

@ -0,0 +1,176 @@
---
inclusion: manual
---
# ArchNest Code Audit — Known Issues & Fix Plan
> Last audited: 2026-06-24. This file tracks known issues by severity.
> Use this to guide fixes before production deploy.
---
## CRITICAL (3) — Must fix before deploy
### 1. Terminal WebSocket session leak
- **File**: `src/lib/TerminalSessionContext.tsx`
- **Problem**: Old WebSocket reference overwritten before cleanup runs on reconnect, creating dangling connections that never close.
- **Fix**: Close existing WS before creating new one. Guard against double-open.
### 2. tmux session name injection
- **File**: `backend/src/routes/terminal.ts`
- **Problem**: Regex allows characters that don't fully prevent shell metacharacter injection when constructing the `tmux attach -t` command.
- **Fix**: Whitelist alphanumeric + dash + underscore only. Reject everything else.
### 3. Docker container ref validation insufficient
- **File**: `backend/src/ssh/docker.ts`
- **Problem**: `CONTAINER_REF_RE` may allow problematic characters through validation gaps.
- **Fix**: Tighten regex to `^[a-zA-Z0-9][a-zA-Z0-9_.-]*$` only.
---
## HIGH (10) — Fix before production
### 4. Unhandled promise in Sidebar
- **File**: `src/components/Sidebar.tsx`
- **Fix**: Add `.catch()` to `api.listIntegrations()` call.
### 5. JWT token in WebSocket URL
- **File**: `src/lib/TerminalSessionContext.tsx`
- **Problem**: Token in query string gets logged in server access logs.
- **Fix**: Send token as first WebSocket message or via subprotocol.
### 6. Path traversal in file operations
- **File**: `backend/src/routes/files.ts`
- **Problem**: No validation against `../` sequences or absolute paths.
- **Fix**: Normalize path, reject if it escapes the allowed root.
### 7. Agent token timing attack
- **File**: `backend/src/routes/agents.ts`
- **Problem**: Early return on length mismatch leaks timing info.
- **Fix**: Use `crypto.timingSafeEqual()` with Buffer padding.
### 8. No data import size limit
- **File**: `backend/src/routes/data.ts`
- **Problem**: Admin can import arbitrarily large JSON → DoS.
- **Fix**: Add `bodyLimit` to the route (e.g., 10MB max).
### 9. SSH connection leak in Docker-over-SSH
- **File**: `backend/src/ssh/docker.ts`
- **Problem**: Promise rejection paths may leave SSH connections open.
- **Fix**: Add `finally` block that calls `client.end()`.
### 10. CORS origin defaults to any
- **File**: `docker-compose.yml` / `server.ts`
- **Problem**: `ARCHNEST_CORS_ORIGIN` falls back to `true` (allow all) when unset.
- **Fix**: Require explicit origin in production. Fail-closed.
### 11. No React error boundary
- **File**: `src/App.tsx`
- **Problem**: Any component crash takes down the entire app.
- **Fix**: Add an ErrorBoundary wrapper around `<Dashboard />`.
### 12. WebSocket JSON.parse unhandled
- **File**: `backend/src/routes/docker.ts`
- **Problem**: No try-catch around JSON.parse in WS message handler.
- **Fix**: Wrap in try-catch, send error frame back to client.
### 13. Session log path traversal
- **File**: `backend/src/routes/terminal.ts`
- **Problem**: `integrationId` used directly in filesystem path without sanitization.
- **Fix**: Validate integrationId is numeric only.
---
## MEDIUM (13) — Should fix
### 14. Auth token in localStorage (XSS risk)
- `src/lib/AuthContext.tsx` — JWT in localStorage. If XSS hits, token is stolen.
- Long-term fix: HttpOnly cookie. Short-term: accept risk, add CSP headers.
### 15. No refresh token mechanism
- `backend/src/routes/auth.ts` — Users forced to re-login on token expiry.
- Fix: Add refresh token rotation endpoint.
### 16. No login rate limiting
- `backend/src/routes/auth.ts` — Brute-force attack possible.
- Fix: Add rate limiter plugin (e.g., `@fastify/rate-limit`, 5 attempts/min).
### 17. Weak binary file detection
- `backend/src/routes/files.ts` — Null byte check insufficient for UTF-8 binary files.
- Fix: Check first 512 bytes for control characters (excluding newline/tab).
### 18. Missing category uniqueness
- `backend/src/db/index.ts``bookmark_categories` allows duplicate names.
- Fix: Add UNIQUE constraint on `(name)` or `(name, tenant_id)` for SaaS.
### 19. IntegrationId validation weak
- `backend/src/routes/docker.ts``Number()` on string param doesn't reject `NaN`.
- Fix: Parse with parseInt, check `isNaN()`, return 400.
### 20. SSH shell without PTY in Docker exec
- `backend/src/routes/dockerSsh.ts` — Output buffering issues in non-interactive mode.
- Fix: Always request PTY for exec sessions.
### 21. No privileged port validation
- `backend/src/routes/tunnels.ts` — Ports <1024 allowed without OS privilege.
- Fix: Warn or reject ports <1024 unless running as root.
### 22. No size validation on data export
- `backend/src/routes/data.ts` — Can generate massive JSON response.
- Fix: Stream response or add a row limit.
### 23. Silent listResources failures
- `backend/src/routes/integrations.ts` — Errors caught but not surfaced.
- Fix: Return partial results + error flag per integration.
### 24. Missing admin action audit logging
- Multiple route files — No before/after state comparison in logs.
- Fix: Log old + new values on integration/user updates.
### 25. Terminal resize race condition
- `backend/src/routes/terminal.ts` — Resize between connect and channel-ready is lost.
- Fix: Queue resize events until channel is confirmed ready.
### 26. Missing database indexes
- `backend/src/db/index.ts` — No indexes on `events.created_at`, `sessions.user_id`.
- Fix: Add indexes for frequently-queried columns.
---
## LOW (14) — Nice to have
27. Missing CSP / X-Frame-Options headers
28. No avatar upload size validation
29. CPU stats edge case (single-core shows 0%)
30. Hardcoded tmux/docker format strings
31. State mutations outside React lifecycle
32. Weak CIDR validation error messages
33. File size limit hardcoded (should be configurable)
34. SSH key format not validated on upload
35. No import audit trail (who imported what, when)
36. Session logging file permissions not restricted
37. Missing Content-Security-Policy header
38. No health check for guacd sidecar
39. Hardcoded 5-second metrics polling interval
40. No graceful shutdown for SSH connections on SIGTERM
---
## Deploy Test Plan
### Phase A: Fix Critical + High
Fix issues 1-13 before any deploy. These represent real security and stability risks.
### Phase B: Test Deploy (AWS)
1. Deploy CloudFormation stack (t4g.small EC2, us-east-1)
2. SSH in, clone repo, create `.env`, `docker compose up -d --build`
3. Run through every page: Glance, Infrastructure, Terminal, Tunnels, Files, Containers, Remote Desktop, Host Metrics, BookNest, Settings, Help
4. Test: SSH connect, Docker list, file upload, bookmark CRUD, metrics polling
5. Budget alarm set at $30/month
### Phase C: Destroy
After successful test:
```bash
aws cloudformation delete-stack --stack-name archnest-test
```
This removes ALL resources (EC2, EIP, SG, budget) — clean slate.

View file

@ -55,6 +55,8 @@
## Colors ## Colors
### Dark Mode (Default — Shipped)
| Role | Value | | Role | Value |
|------|-------| |------|-------|
| Page background | `#0D0E10` | | Page background | `#0D0E10` |
@ -68,6 +70,21 @@
| Text primary | `#E8E6E0` | | Text primary | `#E8E6E0` |
| Text secondary | `#7A7D85` | | Text secondary | `#7A7D85` |
### Light Mode (Planned)
| Role | Value |
|------|-------|
| Page background | `#F7F6F3` |
| Card background | `#FFFFFF` |
| Sidebar background | `#FAF9F7` |
| Border | `#E8E5DF` |
| Gold accent | `#C8A434` |
| Success | `#2ECC71` |
| Warning | `#E67E22` |
| Danger | `#E74C3C` |
| Text primary | `#1A1A1A` |
| Text secondary | `#6B6560` |
## State Management ## State Management
- **No Zustand or other global state library is used.** State is plain React - **No Zustand or other global state library is used.** State is plain React

View file

@ -0,0 +1,44 @@
# Ponytail — Lazy Senior Dev Mode
> Source: [DietrichGebert/ponytail](https://github.com/DietrichGebert/ponytail)
> Content was rephrased for compliance with licensing restrictions.
You are a lazy senior developer. Lazy means efficient, not careless. The best code is the code never written.
Before writing any code, stop at the first rung that holds:
1. Does this need to be built at all? (YAGNI)
2. Does it already exist in this codebase? Reuse the helper, util, or pattern that's already here — don't rewrite it.
3. Does the standard library already do this? Use it.
4. Does a native platform feature cover it? Use it.
5. Does an already-installed dependency solve it? Use it.
6. Can this be one line? Make it one line.
7. Only then: write the minimum code that works.
The ladder runs after you understand the problem, not instead of it: read the task and the code it touches, trace the real flow end to end, then climb.
**Bug fix = root cause, not symptom.** A report names a symptom. Grep every caller of the function you touch and fix the shared function once — one guard there is a smaller diff than one per caller, and patching only the path the ticket names leaves a sibling caller still broken.
## Rules
- No abstractions that weren't explicitly requested.
- No new dependency if it can be avoided.
- No boilerplate nobody asked for.
- Deletion over addition. Boring over clever. Fewest files possible.
- Shortest working diff wins, but only once you understand the problem.
- Question complex requests: "Do you actually need X, or does Y cover it?"
- Pick the edge-case-correct option when two stdlib approaches are the same size — lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.
## Not Lazy About
- Understanding the problem (read it fully and trace the real flow before picking a rung)
- Input validation at trust boundaries
- Error handling that prevents data loss
- Security
- Accessibility
- Anything explicitly requested
## Verification
Non-trivial logic leaves ONE runnable check behind — the smallest thing that fails if the logic breaks (an assert-based self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.

View file

@ -0,0 +1,116 @@
# ArchNest — Project Guide for Kiro
> Steering file for AI sessions working on this repo. Covers architecture
> decisions, workflow rules, and patterns to follow. Read alongside
> `design-rules.md` (visual conventions) which is injected separately.
---
## Quick Context
ArchNest is a **self-hosted ops dashboard** — live infrastructure monitoring,
SSH terminal/tunnels/files, Docker container management, remote desktop, and
bookmarks. **Private Forgejo repo (never public) — no GitHub.** CI/CD is
Forgejo Actions: push to `main` builds images, pushes to
`registry.snsnetlabs.com`, and auto-deploys to **racknerd2** (validation/preview
host) over SSH. See `HANDOFF.md` → "CI/CD & deploy" and `deploy/README.md`.
## Tech Stack (exact versions matter)
| Layer | Tech |
|-------|------|
| Frontend | React 19, Vite 8, TypeScript 6, Tailwind CSS v4, React Router 7 |
| Charts | Recharts 3 |
| Icons | Lucide React (verify exports exist at runtime, not just TS types) |
| Terminal | xterm.js 6 (`@xterm/xterm` + `@xterm/addon-fit`) |
| Backend | Fastify 5, TypeScript 5.7, ESM (`tsx` dev, `tsc -b` build) |
| DB | better-sqlite3 (SQLite) |
| Auth | `@fastify/jwt` + bcryptjs + server-tracked sessions |
| Validation | zod |
| SSH | ssh2 library |
| AWS | `@aws-sdk/client-ec2`, `@aws-sdk/client-sts` |
| Deploy | Docker Compose (Alpine images) |
| CI/CD | Forgejo Actions (`.forgejo/workflows/`): `ci.yml` validate; `build.yml` build+push to `registry.snsnetlabs.com` then auto-deploy to racknerd2. No GitHub. |
## Git Workflow
- **Remote**: `origin` → private Forgejo `forgejo.archnest.local:3000/sam/dev_arc_aws` (SSH via ProxyJump). **Forgejo-only — no GitHub, no `gh` CLI.**
- **Container registry**: `registry.snsnetlabs.com` (user `sam`, package token). Unproxied host so large layers bypass Cloudflare's body cap; web UI/packages stay on `forgejo.snsnetlabs.com`.
- **Never commit on `main`**. Always create `kiro/<feature>` branches.
- **Commit style**: imperative title + body explaining why, with trailers:
```
Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>
```
- **Before committing**: `npm run build` (frontend) + `cd backend && npx tsc --noEmit` (backend). Forgejo CI runs the same.
- **Stage specific files** — never `git add -A` blindly
- **PR flow**: `git push -u origin <branch>` → open a PR on Forgejo (web UI/API) → merge to `main`. **Merging to `main` auto-builds + auto-deploys to racknerd2** (build.yml). `deploy.yml` is a manual dispatch for deploying/rolling back any tag.
## Code Patterns to Follow
### Frontend
- One page component per route in `src/pages/`
- All backend calls go through `src/lib/api.ts` (typed `apiFetch` wrapper)
- No global state library — plain React state + localStorage for prefs
- Auth via `src/lib/AuthContext.tsx` (JWT in localStorage)
- New pages need: route in `App.tsx`, entry in `api.ts`, sidebar link
### Backend
- One route file per feature in `backend/src/routes/`
- Integration adapters in `backend/src/integrations/` (must implement `testConnection()`)
- SSH-based features use `backend/src/ssh/connect.ts` shared transport
- Request validation with zod schemas
- Audit logging via `logEvent()` from `db/index.ts`
- Secrets encrypted at rest (AES-256-GCM via `db/crypto.ts`)
- Never expose secret values to frontend — only `secretKeys: string[]`
### Adding a New Integration
1. Create adapter in `backend/src/integrations/<name>.ts`
2. Register in `backend/src/integrations/registry.ts`
3. Add type to `IntegrationType` union
4. Add route if needed in `backend/src/routes/`
5. Add `api.ts` functions + TS interfaces on frontend
6. Add card in Settings integrations section
## Policies
- **Versioning**: development happens on **even** major versions; **odd** majors
are released/stable lines. We are currently developing **v2** (the prior
released line is v1, see the `v1.0` git tag). Image/version tags should
reflect this — dev builds carry the even (v2) version.
- **Zero mock data** — every number comes from a live API/SSH/DB call
- **Design-first for big features** — write a `docs/<feature>.md` before coding
- **No footer** on any page
- **Primary target**: 1920px+ viewport, should feel spacious
- **Mesh gate** defaults OFF — never lock the live instance
- **OpenSSL legacy provider** in backend Dockerfile — don't remove (needed for old PEM keys)
## Environment
- Required env vars: `ARCHNEST_SECRET_KEY`, `ARCHNEST_JWT_SECRET`
- Optional: `ARCHNEST_DB_PATH`, `PORT`, `ARCHNEST_GUAC_CRYPT_KEY`,
`ARCHNEST_CORS_ORIGIN`, `ARCHNEST_AGENT_TOKEN`, `ARCHNEST_AGENT_STALE_MS`
- Frontend dev proxies `/api``http://localhost:4000`
## Key Files to Read First
1. `README.md` — architecture overview
2. `HANDOFF.md` — current state + standing rules
3. `design-decisions.md` — visual conventions + per-page implementation notes
4. `ROADMAP.md` — deferred/tiered work
5. `docs/` — subsystem design documents
## SSH Config (for reference)
- `ssh forgejo` → Git operations (User: forgejo, via ProxyJump linode)
- `ssh forgejo-admin` → root shell on Forgejo host (for admin tasks)
- `ssh forgejo-runner` → host running the Forgejo Actions runner (has Docker; builds images). Runner config `/opt/config.yaml` sets `container.docker_host: automount`.
- `ssh racknerd2` → validation/preview host (root). Runs the deployed stack from `/opt/archnest/`. Mesh IP `100.96.217.250`. Edge only allows port 22 — view the site via the SSH tunnel hook (`-L 8080:localhost:8080`) at `http://localhost:8080`.
- `ssh linode` → jump host at 172.238.163.85
## CI/CD pipeline (full detail in `deploy/README.md`)
- Push to `main``build.yml`: job `build` (build + push `:latest` and `:<sha>` to the registry) → job `deploy` (needs build; SSH to racknerd2, `docker compose pull && up -d` pinned to `<sha>`, `/api/health` gate).
- Required Forgejo Actions secrets: `FORGEJO_REGISTRY_TOKEN`, `RACKNERD2_SSH_KEY`.
- The build job installs **`docker-ce-cli` from Docker's apt repo** (Debian's `docker.io` is too old for the host daemon). Don't switch it back to `docker.io`.
- racknerd2 `/opt/archnest/docker-compose.yml` PULLS registry images; the repo-root `docker-compose.yml` BUILDS locally (dev/manual).

View file

@ -1,33 +1,43 @@
# ArchNest — Handoff Notes # ArchNest — Handoff Notes
Status snapshot as of **2026-06-21**. Written so a fresh AI session (or human) can pick this up with zero prior context. Branch names rotate every session — always run `git branch --show-current` and work on a fresh feature branch off `main` (recent branches have used a `kiro/<feature>` or `claude/<feature>` naming pattern). Status snapshot as of **2026-06-25**. Written so a fresh AI session (or human) can pick this up with zero prior context. Always run `git branch --show-current` and work on a fresh feature branch off `main` (convention: `kiro/<feature>`).
> **Repo is on Forgejo — no GitHub.** `origin` = `forgejo.archnest.local:3000/sam/dev_arc_aws` (push via SSH). The container registry is `registry.snsnetlabs.com` (separate unproxied host). There is no `gh` CLI / GitHub Actions here.
## TL;DR ## TL;DR
ArchNest is **live and deployed** at `archnest.snsnetlabs.com`, auto-deploying via GitHub Actions (`.github/workflows/deploy.yml`) on every merge to `main` — push triggers a build + SCP + `docker compose up -d --build` on `racknerd1`, with a health-check gate (`/api/health`). Deployment is no longer the open task; it's working infrastructure now. ArchNest is **feature-complete and stable** as a self-hosted ops dashboard. The runtime stack is **better-sqlite3 + `@fastify/jwt`/bcrypt sessions + Docker Compose** (the Postgres/Redis/Cognito/Akamai stack in `README.md` + `docs/aws-architecture/` is the *planned paid AWS scale-up target*, not what runs today). All major subsystems are built and merged. **Auth Phases 1-3 done** (Phase 4 SSO is a deferred paid AWS add-on — see `ROADMAP.md`); **Mesh Prerequisite Gate** shipped (Settings → Mesh, defaults OFF).
**Auth is feature-complete for self-hosted** (Phases 1-3: user menu, password/sessions/login-log, multi-user roles; Phase 4 SSO deferred to a paid AWS add-on — see `ROADMAP.md`). ## CI/CD & deploy — THE SETUP MOVING FORWARD
Since then, **Docker container visibility/management was expanded** (shipped, deployed): Fully automated. **Every push to `main`** runs Forgejo Actions on the `forgejo-runner` host:
- **Persistent SSH terminal sessions** (PR #30) — terminals stay connected across in-app page navigation.
- **Docker-over-SSH management** + **Docker push-agent monitoring** (PR #31) — see the "Docker: three ways" section below.
**The Mesh Prerequisite Gate is now built and shipped** (no longer the open task): NetBird-mesh-required-before-config, with universal CIDR-based verification (not NetBird-specific), a routed-mesh/VPC-peering reachability fallback, and a dedicated "Mesh" section in Settings to configure/test it. Defaults OFF, so it does not lock the live instance. Commits: `46d95fc` (gate), `0409159` (universal CIDR check), `800072f` (routed-mesh fallback), `4a4a5a0` (Settings UI) — all merged to `main`. ```
push main ─► .forgejo/workflows/ci.yml → validate (tsc + build, frontend & backend)
─► .forgejo/workflows/build.yml
job build → build + push images → registry.snsnetlabs.com/sam/{archnest,archnest-backend} (:latest + :<sha>)
job deploy → (needs build) ssh racknerd2 → docker compose pull + up -d @ this <sha> → /api/health gate
```
Most recently (this session, real user dogfooding rather than a planned feature): walked the user through replacing a broken/insecure Docker-TCP-API integration attempt with a working **SSH Host** integration to a real VM ("Portainer VM," running Portainer + a test container), confirmed Docker-over-SSH container management works end to end, and added supporting UX: - **Registry**: `registry.snsnetlabs.com` (user `sam`). It is a **dedicated unproxied (DNS-only) Cloudflare host** so large image layers bypass Cloudflare's ~100 MB body cap (the backend has 260 MB+ layers). The Forgejo **web UI / packages list** stays on `forgejo.snsnetlabs.com` (Cloudflare Access SSO).
- **Docker setup-script hint in Settings** (commit `628187b`, branch `claude/youthful-cerf-ibvxfb`, **pushed but NOT YET merged to `main`** — user explicitly deferred merging once already; revisit with the user before merging) — when editing a Docker (`type: 'docker'`) integration's `baseUrl`, Settings now renders a copyable systemd-override + `curl` verification script scoped to that exact host/port, so users don't have to hand-derive the remote-API-enablement steps themselves. - **Runner**: `forgejo-runner` host (ssh alias `forgejo-runner`), forgejo-runner v6.3.1, runs jobs in `node:22-bookworm` containers. Its config `/opt/config.yaml` sets `container.docker_host: automount` (mounts the host docker.sock into jobs so they can build images); systemd drop-in points the service at that config. The build job installs **`docker-ce-cli` from Docker's official apt repo** (NOT Debian's `docker.io`, which is too old — API 1.41 vs the daemon's required 1.44+).
- **Help page expansion** (commit `36a79ab`, same branch, pushed) — every page entry in `src/pages/Help.tsx` now has at least one real-world example callout (icon + optional label + scenario text), plus a "New here? Start in this order" quick-start card above the grid, aimed at first-time users who don't yet know which page does what. - **Required Forgejo Actions secrets**: `FORGEJO_REGISTRY_TOKEN` (package-scoped token for `sam`, used for registry login/push), `RACKNERD2_SSH_KEY` (private key for `root@racknerd2`, used by the deploy job).
- **`deploy.yml`** is a manual `workflow_dispatch` (deploy/rollback to any tag without rebuilding); the auto-deploy lives in `build.yml`'s `deploy` job.
### racknerd2 — validation / preview host (NOT permanent)
racknerd2 (ssh alias `racknerd2`) is where the deployed build can be **viewed for accuracy**. It only pulls + runs the images (1.9 GiB RAM — never builds). Mesh IP **100.96.217.250**; `/opt/archnest/{docker-compose.yml,.env}` drive a registry-image compose (frontend 8080, backend internal, guacd sidecar). Ports are bound to the mesh IP by default (Docker bypasses ufw, so binding to a specific IP is what keeps it off the public interface).
**Access for review**: RackNerd's edge only allows **inbound port 22** on racknerd2 (80/443/8080 are dropped upstream), so the site is **not directly reachable on its public IP**. View it via the **SSH local-forward tunnel** — Kiro hook **"View ArchNest on racknerd2 (localhost:8080)"** (`.kiro/hooks/tunnel-racknerd2-8080.kiro.hook`) runs `ssh -L 8080:localhost:8080 -N racknerd2`; trigger it, then open **http://localhost:8080**. A real public URL (later) goes through the NPM reverse proxy on linode (TLS), not racknerd2's raw IP.
### → NEXT TASK for the picking-up agent ### → NEXT TASK for the picking-up agent
No new feature is queued. Pick up from here: **Nothing is queued; the pipeline above is the baseline.** Push to `main` → it auto-builds and auto-deploys to racknerd2; view via the tunnel hook. Pick the next priority with the user (the `ROADMAP.md` tiered/paid add-ons are the menu). Optional small follow-ups noted but not requested: bump `package.json`/About panel to **v2** (convention recorded below); add a one-click "stop tunnel" hook.
1. **Decide with the user whether to merge `claude/youthful-cerf-ibvxfb` into `main`.** It contains the Docker setup-script hint (`628187b`) and the Help page expansion (`36a79ab`), both already build-clean (`npm run build` passes). Nothing else is blocking it.
2. **Ask the user if removing the unused Docker API integration (the one superseded by the SSH Host setup) is done** — this was a live-instance UI action on their end, not something done via this repo's code.
3. Otherwise, check with the user for the next priority — there is no pending design doc or half-built feature waiting right now (mesh gate and Docker UX work above are both fully shipped or ready-to-merge).
## Standing rules (read before doing anything) ## Standing rules (read before doing anything)
- **Versioning convention**: development happens on **even** major versions, releases on **odd**. We are currently developing **v2** (prior released line is v1 — see the `v1.0` git tag). Dev image/version tags carry the even (v2) number. `package.json` (root + backend) still reads `0.0.0` and the Settings → About panel is hardcoded `v1.0.0`; neither has been bumped to v2 yet.
- **Branch**: never commit on `main`. Create a fresh feature branch off `main` (recent convention: `kiro/<short-feature>`). Confirm with `git branch --show-current` before starting. - **Branch**: never commit on `main`. Create a fresh feature branch off `main` (recent convention: `kiro/<short-feature>`). Confirm with `git branch --show-current` before starting.
- **Workflow per change**: type-check (`npx tsc --noEmit -p .` in repo root AND in `backend/`) — and for frontend changes prefer a full `npm run build` (which runs `tsc -b && vite build`; the stricter `tsc -b` has caught errors a plain `tsc --noEmit` missed via stale incremental cache) → commit → `git fetch origin main && git rebase origin/main``git push -u origin <branch>` → open a PR with `gh pr create` → squash-merge (`gh pr merge <n> --squash --delete-branch`) → poll the resulting run (`gh run list --branch main`, then `gh run watch <id> --exit-status`) until `validate` and `deploy` both succeed (deploy's last step is "Health check (backend /api/health)"). - **Workflow per change**: type-check (`npx tsc --noEmit -p .` in repo root AND in `backend/`) — for frontend changes prefer a full `npm run build` (`tsc -b && vite build`; stricter than plain `tsc --noEmit`) → commit → `git fetch origin main && git rebase origin/main``git push -u origin <branch>` → open a PR on Forgejo (web UI/API) and merge to `main`. **Merging to `main` auto-triggers CI: validate + build + push + auto-deploy to racknerd2** (`.forgejo/workflows/`). There is no `gh` CLI here. Watch a run via the runner: `ssh forgejo-runner 'docker ps'` (job containers) / `journalctl -u forgejo-runner`, and confirm the result by checking the SHA-tagged image in `registry.snsnetlabs.com` and `/api/health` on racknerd2 (via the tunnel hook).
- **`git add -A` caution**: this has twice swept up unrelated untracked files (e.g. a bookmark-import JSON the user asked to be generated, not committed) into unrelated PRs. Prefer `git add <specific files>` and always check `git diff --cached --stat` before committing. - **`git add -A` caution**: this has twice swept up unrelated untracked files (e.g. a bookmark-import JSON the user asked to be generated, not committed) into unrelated PRs. Prefer `git add <specific files>` and always check `git diff --cached --stat` before committing.
- **Never open a PR unless the user's intent is clearly "ship this."** For exploratory/planning asks, use `AskUserQuestion` to confirm scope first — see how the Phase 2/3/4 plan below was scoped before any code was written. - **Never open a PR unless the user's intent is clearly "ship this."** For exploratory/planning asks, use `AskUserQuestion` to confirm scope first — see how the Phase 2/3/4 plan below was scoped before any code was written.
- **Mock data policy**: zero mock/fabricated data. Verify with `grep -ri "mock\|fake\|placeholder" src/ backend/src/` if continuing feature work and unsure. - **Mock data policy**: zero mock/fabricated data. Verify with `grep -ri "mock\|fake\|placeholder" src/ backend/src/` if continuing feature work and unsure.
@ -127,14 +137,14 @@ Moved out of the core build. Planned as a **paid add-on shipped when ArchNest is
Moved to **`ROADMAP.md`** ("Known non-blocking stubs"). Summary: the Infrastructure "Network" sub-tab is intentionally disabled, and the Settings Appearance and Notifications sections are non-functional placeholders. None are flagged as work to do unless explicitly asked — check the latest conversation/commits before assuming a direction. Moved to **`ROADMAP.md`** ("Known non-blocking stubs"). Summary: the Infrastructure "Network" sub-tab is intentionally disabled, and the Settings Appearance and Notifications sections are non-functional placeholders. None are flagged as work to do unless explicitly asked — check the latest conversation/commits before assuming a direction.
## Deployment (already working — reference only) ## Deployment (current — Forgejo Actions, automated)
`docker-compose.yml` (3 services: `archnest` frontend, `archnest-backend`, `guacd`) + `.github/workflows/deploy.yml` (push-to-`main` → SCP + `docker compose up -d --build` on `racknerd1`, gated on an `/api/health` check) are live and require no further setup. If a deploy fails, check the GitHub Actions run's `deploy` job steps in order — `Pre-flight` (host `.env` exists), `Copy repo to racknerd1`, `Build, restart, and clean up`, `Health check`. Full pipeline is documented in **"CI/CD & deploy — THE SETUP MOVING FORWARD"** near the top of this file and in **`deploy/README.md`**. Summary: push to `main` → Forgejo Actions builds + pushes images to `registry.snsnetlabs.com` and auto-deploys to **racknerd2** (validation host) over SSH, SHA-pinned, `/api/health` gated. View racknerd2 via the SSH tunnel hook → `http://localhost:8080` (its public IP only allows port 22). The old GitHub-Actions→racknerd1 SCP pipeline is gone (migrated to Forgejo). `docker-compose.yml` at the repo root still BUILDS locally (dev/manual); `deploy/docker-compose.yml` PULLS from the registry (what racknerd2 runs).
## Quick orientation for a new session ## Quick orientation for a new session
1. Read this file, then `ROADMAP.md` (deferred/tiered work), then `docs/` (subsystem design docs — `docker-agent-monitoring.md`, `mesh-prerequisite-gate.md`), then `TERMIX_MIGRATION.md` for feature-level history, then skim `git log --oneline -30`. 1. Read this file, then `deploy/README.md` (build/deploy pipeline), then `ROADMAP.md` (deferred/tiered work), then `docs/` (subsystem design docs — `docker-agent-monitoring.md`, `mesh-prerequisite-gate.md`, `rdp-debug-handoff.md`, `aws-architecture/system-design.md`), then `TERMIX_MIGRATION.md` for feature history, then skim `git log --oneline -30`.
2. Frontend: prefer `npm run build` (`tsc -b && vite build`) over a plain `tsc --noEmit` (stricter, catches more). Backend: `npx tsc --noEmit -p .` from `backend/`. Both must pass before any commit. 2. Frontend: prefer `npm run build` (`tsc -b && vite build`) over plain `tsc --noEmit`. Backend: `npx tsc --noEmit -p .` from `backend/`. Both must pass before any commit (Forgejo CI runs exactly this).
3. **The Mesh Prerequisite Gate is built and shipped** (Settings → Mesh; defaults OFF). **There is no other planned feature queued right now** — check the "→ NEXT TASK" section above first (merge decision on `claude/youthful-cerf-ibvxfb`), then ask the user for the next priority. Auth Phases 1-3 are done; Phase 4 SSO is a deferred paid AWS add-on (`ROADMAP.md`). 3. **Nothing is queued and nothing is half-built.** All major subsystems are merged; CI/CD auto-builds + auto-deploys to racknerd2 on every push to `main`. Check the "→ NEXT TASK" section above, then ask the user for the next priority (`ROADMAP.md` lists deferred/paid add-ons).
4. If asked to add a feature, follow existing patterns: integration adapters in `backend/src/integrations/`, SSH-backed engines in `backend/src/ssh/`, one route file per feature in `backend/src/routes/`, one `api.ts` entry + page component per frontend feature. Subsystem-level work gets a `docs/` design doc first. 4. If asked to add a feature, follow existing patterns: integration adapters in `backend/src/integrations/`, SSH-backed engines in `backend/src/ssh/`, one route file per feature in `backend/src/routes/`, one `api.ts` entry + page component per frontend feature. Subsystem-level work gets a `docs/` design doc first.
5. For anything ambiguous in scope, use `AskUserQuestion` rather than guessing — that's how the auth phases, the Docker agent tiering, and the mesh-gate decisions were all scoped. 5. For anything ambiguous in scope, ask the user rather than guessing — that's how the auth phases, Docker agent tiering, and mesh-gate decisions were all scoped.

292
README.md
View file

@ -1,256 +1,94 @@
# ArchNest # ArchNest
A self-hosted ops dashboard for a homelab/cloud setup: live infrastructure A multi-tenant SaaS platform for infrastructure management — SSH terminal,
monitoring across 9 real integration types, a categorized bookmark hub, a Docker management, remote desktop, host metrics, file management, and 9
full SSH suite (terminal, tunnels, file manager, host-to-host transfer, live real integration adapters from a single browser interface. Developer-first
host metrics), Docker container management, and RDP/VNC/Telnet remote desktop alternative to enterprise RMM tools, starting at $2.50/month.
— all in one app, with zero mock data anywhere.
**This repo is private and will never be public.** This README is written for ## Pricing
the owner and for any AI session picking up the project cold — it should be
detailed enough that neither needs to re-derive context from scratch.
## What this is, in one paragraph | | Starter | Pro | Team |
|---|---|---|---|
| Monthly | $2.50/mo | $4.25/mo | $12/mo |
| Annual | $25/yr | $45/yr | $95/yr |
| Hosts | 50 | 125 | Unlimited |
| Users | 5 | 50 | 200 |
| Remote Desktop | — | ✓ | ✓ |
| SSO | — | — | ✓ |
ArchNest replaced a Homarr-style bookmark dashboard plus a handful of ## Features
disconnected admin tools (Proxmox UI, Portainer, separate SSH terminals,
WinSCP-equivalents) with one app that talks directly to the underlying
systems. It started as a 6-page mockup/portfolio piece and has since grown
into an 11-page real tool with a real Fastify backend, real SSH/Docker/cloud
integrations, and no synthetic data — every number on every page comes from
a live API call, a SQLite-backed table, or an SSH command run against a
managed host.
## Current state & direction **SSH Suite** — Terminal (multi-tab, split panes, persistent sessions), tunnels
(local/remote/SOCKS5), SFTP file manager, host-to-host transfer, host metrics
(5s polling), jump-host chaining, tmux, certificate auth (OPKSSH).
**Live and deployed** at `archnest.snsnetlabs.com`, auto-deploying on every **Docker** — Management via TCP API, CLI over SSH, or push agent. Container
merge to `main` via `.github/workflows/deploy.yml`. All 11 pages and their actions, logs, interactive exec, detail views.
backend routes are built and working — there is no pending/on-hold page.
Auth is feature-complete for self-hosted (Phases 1-3: user menu wiring, **Remote Desktop** — RDP/VNC/Telnet via Guacamole (Pro+).
password/sessions/login-log, multi-user roles with a 10-seat cap); Phase 4
(Authentik SSO) is **deferred to a paid AWS add-on** — see `ROADMAP.md`.
Recently shipped: persistent terminal sessions across navigation, Docker
container visibility/management three ways (Engine TCP API, `docker` CLI over
SSH, and a read-only push agent — see `docs/docker-agent-monitoring.md`), and
the **Mesh Prerequisite Gate** — a universal CIDR-based mesh-verification
requirement (with a routed-mesh/VPC-peering fallback, not NetBird-specific),
configurable from Settings → Mesh and defaulting OFF so it can't lock the live
instance.
There is no feature currently in progress. See `HANDOFF.md` for the latest **Integrations** — Proxmox, Docker, AWS, Cloudflare, NetBird, Uptime Kuma,
status and next steps. Weather, SSH, Remote Desktop. All real, no mocks.
If you're a fresh AI session: read this file, then `HANDOFF.md` (current **Bookmarks** — Categorized hub with favorites, link health, full CRUD.
task state + standing workflow rules), then `design-decisions.md` (visual
conventions + accurate per-page implementation notes), then `ROADMAP.md`
(deferred/tiered work) and the `docs/` design docs (`docker-agent-monitoring.md`,
`mesh-prerequisite-gate.md`), then `TERMIX_MIGRATION.md`
(history of how the SSH/Docker/Guacamole feature set was built) if you need
that context.
## Pages **Auth** — Cognito (OIDC/SAML SSO for Team), MFA, multi-user roles, audit log.
| Page | Route | What it does | **4 Themes** — ArchNest Dark, Midnight Blue, Forest, Light.
|------|-------|---------------|
| Glance | `/` | Home dashboard — system/integration health, resource overview, recent activity, shortcuts |
| Infrastructure | `/infrastructure` | Resource inventory across all integrations — distribution donut, per-resource status grid, integration health, activity |
| BookNest | `/booknest` | Categorized bookmark hub — quick access, favorites, link health, full CRUD |
| Terminal | `/terminal` | Web SSH terminal — multi-tab, split panes, tmux attach, cert auth (OPKSSH); **sessions stay connected across page navigation** |
| Tunnels | `/tunnels` | SSH tunnel manager — local/remote/dynamic (SOCKS5) forwarding, auto-start, live status |
| Files | `/files` | SFTP file browser/editor over managed SSH hosts, with host-to-host transfer |
| Containers | `/containers` | Docker containers across **three sources** (Engine TCP API, `docker` CLI over SSH, or a read-only push agent) — list/start/stop/restart/pause/remove, logs, interactive exec; tabbed with a clickable per-container detail view |
| Remote Desktop | `/remote-desktop` | RDP/VNC/Telnet sessions via a Guacamole sidecar |
| Host Metrics | `/host-metrics` | Live CPU/memory/disk/network/processes/ports/firewall/login-activity per SSH host, polled every 5s |
| Settings | `/settings` | Profile, Appearance, Security, Integrations, Notifications, Data & Backup, About — deep-linkable via `?tab=` |
| Help | `/help` | Static guided tour of every page above |
| Login / Enrollment | `/login`, `/enrollment` | Auth entry points — not in the sidebar nav |
See `design-decisions.md`'s "Page Notes" section for a detailed, per-page
breakdown of layout, real data sources, and known quirks — it's kept in sync
with the actual code, not a spec written before the page existed.
## Architecture ## Architecture
### Frontend (`/src`) Hybrid: Akamai Cloud for compute, AWS for managed services.
- React 19 + Vite + TypeScript, Tailwind CSS v4, Recharts (donuts/area
charts), Lucide React icons, React Router.
- `src/lib/api.ts` — typed fetch wrapper (`apiFetch`) + one function per
backend endpoint + matching TS interfaces. This is the contract between
frontend and backend; any new backend route needs a matching entry here.
- `src/lib/AuthContext.tsx` — auth state backed by `localStorage` (JWT
carrying a server-tracked session id; signing out revokes the session
server-side).
- `src/lib/TerminalSessionContext.tsx` — keeps SSH terminal sessions
(xterm + WebSocket + DOM node) alive above the router so they survive
in-app navigation; shared constants in `src/lib/terminalPrefs.ts`.
- `src/pages/` — one file per route (see table above), plus `Login.tsx` /
`Enrollment.tsx` for the unauthenticated/first-run flows.
- `src/components/``TopBar.tsx` (title, global search across pages/
integrations/bookmarks, user dropdown), `Sidebar.tsx` (nav + system-health
rollup widget).
- `App.tsx` — route table, plus per-route hero-banner config (`showHero`,
`heroPaddingTop`, `heroObjectPosition` lookup maps) and `topBarHeight`
lookup for pages with a subtitle (currently only BookNest).
### Backend (`/backend`) | Layer | Provider | Service |
- Fastify 5, TypeScript, ESM (`tsx` for dev, `tsc -b` for build), entrypoint |-------|----------|---------|
`src/server.ts`. | Compute | Akamai | G7 Dedicated (4GB, ARM) |
- `backend/src/db/index.ts` — SQLite schema + `logEvent()` audit log, | Load Balancer | Akamai | NodeBalancer |
plus `sessions`/`login_events` tables and a multi-user `users` schema | Frontend | Akamai | Object Storage |
(`role` admin/member + `active` columns). | Database | Self-managed | PostgreSQL (RLS) |
- `backend/src/db/crypto.ts` — AES-256-GCM `encryptSecret`/`decryptSecret`, | Cache | Self-managed | Redis |
keyed by `ARCHNEST_SECRET_KEY`. | Auth | AWS | Cognito |
- `backend/src/routes/` — one file per feature area: | Secrets | AWS | Secrets Manager |
- `auth.ts` — setup, login, profile, password change, sessions, | Storage | AWS | S3 |
login audit log, and admin-only user management (`/api/setup`, | DNS | AWS | Route 53 |
`/api/auth/*`, `/api/users`) | Email | AWS | SES |
- `integrations.ts` — integration CRUD + connection testing
- `bookmarks.ts` — bookmarks + categories CRUD
- `events.ts` — activity log retrieval
- `terminal.ts` — SSH terminal WebSocket (`connect`/`input`/`resize`/
`list_tmux`/`disconnect`)
- `tunnels.ts` — SSH tunnel CRUD + connect/disconnect
- `files.ts` — SFTP list/read/write/mkdir/rename/delete/chmod/download/upload
- `docker.ts` — Docker Engine TCP API: container list/stats/logs/actions + exec WebSocket
- `dockerSsh.ts` — Docker over SSH: runs the `docker` CLI on a remote SSH host (list/logs/actions + exec WebSocket); no dockerd socket exposed
- `agents.ts` — Docker monitoring agents: token-gated push ingest (`POST /api/agents/docker/report`) + read-only host/container views
- `guacamole.ts` — Guacamole WebSocket proxy for remote desktop
- `metrics.ts` — live host metrics endpoint
- `transfer.ts` — host-to-host file transfer orchestration (start/poll/cancel)
- `data.ts` — full backup export/import (integrations + secrets + bookmarks + tunnels)
- `backend/src/integrations/` — one adapter per type, all real (none are
stubs): `proxmox.ts`, `docker.ts`, `netbird.ts`, `cloudflare.ts`, `aws.ts`,
`uptimeKuma.ts`, `weather.ts`, `ssh.ts`, `remoteDesktop.ts`. Each implements
`testConnection()` (required) and `listResources()` (optional);
`registry.ts` maps `IntegrationType` → adapter.
- `backend/src/ssh/` — the shared SSH transport layer used by Terminal,
Files, Tunnels, Transfers, and Host Metrics:
- `connect.ts` — jump-host chaining, host-key verification, certificate auth
- `sftp.ts` — ephemeral SFTP connections for file ops
- `transfer.ts` — streamed host-to-host copy/move with progress + cancel
- `docker.ts` — runs the `docker` CLI over SSH for the Containers page's
"Docker over SSH" source (list/logs/actions + interactive exec)
- `metrics/` — 10 sequential collectors (cpu, memory, disk, uptime,
network, system, processes, ports, firewall, login-stats) — sequential
on purpose, to stay under OpenSSH's `MaxSessions` limit per host.
- Docker images run on Alpine; **OpenSSL legacy provider is enabled** in
`backend/Dockerfile` (`OPENSSL_CONF=/etc/ssl/openssl-legacy.cnf`) so
old-format encrypted PEM keys (`BEGIN RSA PRIVATE KEY` + `DEK-Info`) still
decrypt under OpenSSL 3 — don't remove this without understanding why.
- **Required env vars, no defaults**: `ARCHNEST_SECRET_KEY`,
`ARCHNEST_JWT_SECRET`. The server refuses to start without both. Optional:
`ARCHNEST_DB_PATH`, `PORT`, `ARCHNEST_GUAC_CRYPT_KEY` /
`ARCHNEST_GUACD_HOST` / `ARCHNEST_GUACD_PORT`, `ARCHNEST_CORS_ORIGIN`,
`ARCHNEST_SESSION_LOG_DIR` (optional terminal session logging),
`ARCHNEST_AGENT_TOKEN` (shared token enabling the Docker monitoring-agent
ingest endpoint — ingest is disabled / returns 503 when unset),
`ARCHNEST_AGENT_STALE_MS` (default 90000; when an agent report is shown stale).
- `backend/src/docker/` — Docker Engine TCP API client used by `docker.ts`.
- `agent/` — the standalone Docker monitoring agent (`archnest-docker-agent.sh`
+ install/README). Runs on each Docker VM and pushes reports to ArchNest.
## Development **Infrastructure cost:** ~$66.50/month at 50 users. Scales to full AWS
(Fargate + Aurora) at 100+ users / $500+ MRR.
Frontend: See [`docs/aws-architecture/system-design.md`](docs/aws-architecture/system-design.md)
```bash for the full system design with diagrams, cost analysis, tier enforcement,
npm install and scale-up path.
npm run dev
```
Backend:
```bash
cd backend
npm install
ARCHNEST_SECRET_KEY=$(openssl rand -hex 32) ARCHNEST_JWT_SECRET=$(openssl rand -hex 32) npm run dev
```
`ARCHNEST_DB_PATH` optionally overrides the SQLite file location (defaults to
a local path under `backend/`). `PORT` overrides the listen port (check
`server.ts` for the default).
Type-check both before committing — this is the minimum bar, not a substitute
for testing in a browser:
```bash
npx tsc --noEmit # from repo root, frontend
cd backend && npx tsc --noEmit # backend
```
Vite/the browser surface some runtime errors (e.g. missing icon exports —
see the lucide-react gotcha in `design-decisions.md`) that the type-checker
won't catch.
## Tech Stack ## Tech Stack
**Frontend** **Frontend**: React 19, Vite 8, TypeScript, Tailwind CSS v4, React Router,
- React 19 + Vite + TypeScript, React Router, Tailwind CSS v4 Recharts, Lucide React, xterm.js
- Recharts (donuts, line/area charts), Lucide React (icons)
- xterm.js (Terminal page terminal rendering)
**Backend** **Backend**: Fastify 5, TypeScript, PostgreSQL, Redis, zod, ssh2
- Fastify 5 + TypeScript, `tsx` for dev, `tsc -b` for build
- `better-sqlite3` for storage
- `@fastify/jwt` for auth tokens, `bcryptjs` for password hashing
- `zod` for request validation
- AES-256-GCM (Node `crypto`) for encrypting integration secrets at rest
- SSH client library powering the SSH transport layer (`backend/src/ssh/`)
- Guacamole Lite protocol for RDP/VNC/Telnet, proxied to a `guacd` sidecar
**Integrations**: Proxmox, Docker, NetBird, Cloudflare, AWS, Uptime Kuma, **Auth**: AWS Cognito (OIDC/SAML SSO, MFA, PKCE)
Weather (wttr.in), SSH, Remote Desktop (RDP/VNC/Telnet via Guacamole) — see
`backend/src/integrations/` for adapter implementations.
**Deploy target:** Docker on `racknerd1` → Nginx Proxy Manager at **CI/CD**: Forgejo Actions → Docker → Akamai VM deploy
`archnest.snsnetlabs.com`.
## Deployment ## Development
**Live and deployed.** `.github/workflows/deploy.yml` triggers on every push ```bash
to `main`: builds, SCPs the repo to `racknerd1`, and runs npm install && npm run dev # frontend
`docker compose up -d --build` there, gated on an `/api/health` health check. cd backend && npm install && npm run dev # backend
No further setup is needed — merging a PR to `main` redeploys automatically. ```
`docker-compose.yml` runs 3 services: `archnest` (frontend), `archnest-backend`, Type-check before committing:
and `guacd` (remote desktop sidecar). ```bash
npm run build # frontend
cd backend && npx tsc --noEmit # backend
```
If a deploy fails, check the workflow run's `deploy` job steps in order: ## Documentation
`Pre-flight` (confirms host `.env` exists) → `Copy repo to racknerd1`
`Build, restart, and clean up``Health check (backend /api/health)`.
One-time setup already done (reference only, shouldn't need repeating): host | File | Content |
provisioning (Docker/Compose on `racknerd1`, deploy SSH user, `/opt/archnest` |------|---------|
directory), `/opt/archnest/.env` populated from `.env.example` with real | [`docs/aws-architecture/system-design.md`](docs/aws-architecture/system-design.md) | Full architecture, costs, tier enforcement |
secrets, `RACKNERD_HOST`/`RACKNERD_USER`/`RACKNERD_SSH_KEY` added as GitHub | [`design-decisions.md`](design-decisions.md) | Visual conventions + per-page notes |
Actions secrets, DNS/Nginx Proxy Manager pointed at the host. | [`HANDOFF.md`](HANDOFF.md) | Current state, workflow rules |
| [`ROADMAP.md`](ROADMAP.md) | Deferred/tiered work |
## Documentation map
- **`README.md`** (this file) — architecture, tech stack, deployment, page list.
- **`HANDOFF.md`** — current task state, standing workflow rules (git workflow,
mock-data policy, secrets discipline), and the auth/SSO roadmap. Read this
before starting any new work session.
- **`design-decisions.md`** — visual/UX conventions (colors, typography, card
style, animations) plus a detailed, accurate-as-of-now "Page Notes" section
per page — what's actually rendered and where its data comes from. This is
the file to update whenever a page's layout or data source changes.
- **`TERMIX_MIGRATION.md`** — phase-by-phase history of how the SSH/Tunnels/
Files/Containers/Remote Desktop/Host Metrics/Transfer/Data-export feature
set was built (originally scoped as a migration from a forked Termix
project, hence the name). Useful for historical "why was it built this
way" context on those specific features.
- **`.kiro/steering/design-rules.md`** — a condensed duplicate of
`design-decisions.md`'s Global Rules, auto-injected into every Kiro IDE
session (the Kiro extension reads `.kiro/steering/*` automatically). If you
update a global design rule, update both files in the same change —
`design-decisions.md` is canonical, this one just needs to stay in sync so
Kiro doesn't steer on stale info.
Three older docs were deleted as part of a documentation cleanup:
`archnest-blueprint.md` and `glance.md` (the original 6-page mockup pitch and
an early Glance-only spec, both describing fictional config files and
placeholder numbers that never matched the real build), and
`.kiro/specs/archnest-dashboard/` (an abandoned Kiro spec — requirements-only,
no `design.md`/`tasks.md` ever followed — describing the same stale 6-page/
80px-sidebar/Zustand-based vision). Their still-accurate content (color
palette, dropdown menu shape, card styling) was folded into
`design-decisions.md` and `.kiro/steering/design-rules.md`; everything else
was superseded by the real, deployed implementation described above.

View file

@ -2,7 +2,7 @@
Status doc for porting Termix's full feature set into ArchNest as a single app, single backend, single auth, single database — reskinned to match ArchNest's design. Written so any session (human or AI) can see exactly what's done, what's next, and why decisions were made. Status doc for porting Termix's full feature set into ArchNest as a single app, single backend, single auth, single database — reskinned to match ArchNest's design. Written so any session (human or AI) can see exactly what's done, what's next, and why decisions were made.
**Migration status: COMPLETE.** All 8 phases below are DONE and verified. No further feature work is queued on this branch. If you're picking this project up, the only remaining task is the GitHub Actions deploy setup — see `HANDOFF.md` and the Deployment section of `README.md`. Do not start new feature work here without explicit instruction. **Migration status: COMPLETE.** All 8 phases below are DONE and verified. No further feature work is queued from this migration. CI/CD has since moved to **Forgejo Actions** (build → `registry.snsnetlabs.com` → auto-deploy to racknerd2) — see `HANDOFF.md` and `deploy/README.md`. Do not start new feature work here without explicit instruction.
Source: `https://github.com/SamuelSJames/Termix` (user's fork), cloned for reference at the time of writing. Upstream is `Termix-SSH/Termix`, an Electron + Express + Drizzle ORM self-hosted SSH/RDP/VNC management app — **not** a small terminal widget. It ships as its own Docker image with a `guacd` sidecar for RDP/VNC. Source: `https://github.com/SamuelSJames/Termix` (user's fork), cloned for reference at the time of writing. Upstream is `Termix-SSH/Termix`, an Electron + Express + Drizzle ORM self-hosted SSH/RDP/VNC management app — **not** a small terminal widget. It ships as its own Docker image with a `guacd` sidecar for RDP/VNC.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2 MiB

View file

Before

Width:  |  Height:  |  Size: 1.8 MiB

After

Width:  |  Height:  |  Size: 1.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 MiB

View file

Before

Width:  |  Height:  |  Size: 1.9 MiB

After

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2 MiB

View file

@ -28,10 +28,10 @@
}, },
"devDependencies": { "devDependencies": {
"@types/bcryptjs": "^2.4.6", "@types/bcryptjs": "^2.4.6",
"@types/better-sqlite3": "^7.6.12", "@types/better-sqlite3": "^7.6.13",
"@types/node": "^22.10.5", "@types/node": "^22.20.0",
"tsx": "^4.19.2", "tsx": "^4.19.2",
"typescript": "^5.7.3" "typescript": "^5.9.3"
} }
}, },
"node_modules/@aws-crypto/crc32": { "node_modules/@aws-crypto/crc32": {
@ -1259,9 +1259,9 @@
} }
}, },
"node_modules/@types/node": { "node_modules/@types/node": {
"version": "22.19.21", "version": "22.20.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-22.19.21.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-22.20.0.tgz",
"integrity": "sha512-VMeFBSCKQKmm2swI2kW51SFusDqekC6q9trBCvJ/JliDchFSuoYYKN7yVNjPthP1HKZcx3U1gI/wTcEBjEFKTA==", "integrity": "sha512-QWlFW2wf3nTjC13/DqRnBpR4ZO36VJH/JVBkA/vcnmbTBNQIlnObqyqZE1tUR7+Ni23Lda8R1BxMfbXRpCUx5g==",
"dev": true, "dev": true,
"license": "MIT", "license": "MIT",
"dependencies": { "dependencies": {

View file

@ -29,9 +29,9 @@
}, },
"devDependencies": { "devDependencies": {
"@types/bcryptjs": "^2.4.6", "@types/bcryptjs": "^2.4.6",
"@types/better-sqlite3": "^7.6.12", "@types/better-sqlite3": "^7.6.13",
"@types/node": "^22.10.5", "@types/node": "^22.20.0",
"tsx": "^4.19.2", "tsx": "^4.19.2",
"typescript": "^5.7.3" "typescript": "^5.9.3"
} }
} }

24
deploy/.env.example Normal file
View file

@ -0,0 +1,24 @@
# Copy to `.env` next to deploy/docker-compose.yml ON racknerd2 (never commit the real .env).
# Compose loads it automatically.
# Image tag to deploy. The build workflow pushes both :latest and the commit
# SHA; use :latest for rolling validation or pin a SHA for a specific build.
ARCHNEST_TAG=latest
# Interface the app is published on. Mesh IP only — do NOT bind 0.0.0.0.
ARCHNEST_BIND_IP=100.96.217.250
# Origin the frontend is served from (used for CORS). Mesh URL for validation.
ARCHNEST_CORS_ORIGIN=http://100.96.217.250:8080
# 32-byte hex. Signs auth JWTs. Generate: openssl rand -hex 32
ARCHNEST_JWT_SECRET=
# 32-byte hex. Encrypts integration secrets at rest (AES-256-GCM).
# Changing this after data exists makes existing secrets undecryptable.
# Generate: openssl rand -hex 32
ARCHNEST_SECRET_KEY=
# Exactly 32 ASCII chars (used literally as an AES-256-CBC key for Guacamole).
# Generate: openssl rand -base64 24 | cut -c1-32
ARCHNEST_GUAC_CRYPT_KEY=

89
deploy/README.md Normal file
View file

@ -0,0 +1,89 @@
# ArchNest — Build & Deploy (Forgejo Actions → registry → racknerd2)
This pipeline builds the Docker images in Forgejo Actions, pushes them to the
Forgejo container registry, and deploys them to **racknerd2** (validation host)
over the NetBird mesh. racknerd2 only pulls and runs — it never builds (1.9 GiB
RAM).
```
push to main ─► [build.yml]
job: build ─► build + push images ─► registry.snsnetlabs.com/sam/{archnest,archnest-backend}
job: deploy ─► (needs build) ssh racknerd2 ─► compose pull + up -d (this build's SHA) ─► /api/health
manual dispatch (any tag / rollback) ─► [deploy.yml] ssh racknerd2 ─► compose pull && up -d
```
Every push to `main` auto-builds and auto-deploys to racknerd2. `deploy.yml`
stays as a manual `workflow_dispatch` for deploying/rolling back to an arbitrary
tag without rebuilding.
## Images
| Image | From | Tags |
|-------|------|------|
| `registry.snsnetlabs.com/sam/archnest` | root `Dockerfile` (React build → nginx) | `latest`, `<commit-sha>` |
| `registry.snsnetlabs.com/sam/archnest-backend` | `backend/Dockerfile` (Fastify) | `latest`, `<commit-sha>` |
`registry.snsnetlabs.com` is the **unproxied (DNS-only)** registry host, so large
layers bypass Cloudflare's ~100 MB request-body cap. Pushed images appear at
`https://forgejo.snsnetlabs.com/sam/-/packages` (web UI, Cloudflare Access SSO).
## One-time setup
### 1. Forgejo Actions secrets (repo or org settings → Actions → Secrets)
- `FORGEJO_REGISTRY_TOKEN` — Forgejo personal access token for `sam` with
**package** scope (NOT the account password). Used by `build.yml` to log in
and push.
- `RACKNERD2_SSH_KEY` — private SSH key authorized for `root@racknerd2`
(mesh IP `100.96.217.250`). Used by `deploy.yml`.
### 2. Runner (forgejo-runner host) — allow Docker builds
The runner runs jobs inside containers and by default has **no Docker access**.
Enable socket auto-mounting so the `build` job can build images. Create
`/opt/config.yaml` (or edit the existing runner config) with at least:
```yaml
container:
docker_host: "automount" # mounts /var/run/docker.sock into job containers
```
Generate a full example with `forgejo-runner generate-config > /opt/config.yaml`,
set `docker_host: "automount"`, point the service at it
(`ExecStart=/usr/local/bin/forgejo-runner daemon -c /opt/config.yaml`), then
`systemctl daemon-reload && systemctl restart forgejo-runner`.
### 3. racknerd2 — prepare the deploy host
Docker Engine + compose plugin are already installed. Then:
```bash
mkdir -p /opt/archnest
# copy deploy/docker-compose.yml from this repo to /opt/archnest/docker-compose.yml
# create /opt/archnest/.env from deploy/.env.example and fill in the secrets:
# ARCHNEST_JWT_SECRET = openssl rand -hex 32
# ARCHNEST_SECRET_KEY = openssl rand -hex 32
# ARCHNEST_GUAC_CRYPT_KEY = openssl rand -base64 24 | cut -c1-32
docker login registry.snsnetlabs.com # user: sam, password: the package token
```
Ports are bound to the **mesh IP only** (`100.96.217.250`) — Docker bypasses
ufw, so this is what keeps the app off the public interface. Validate at
`http://100.96.217.250:8080`.
## Running it
- **Automatic**: push to `main``build.yml` builds + pushes both images, then
its `deploy` job (needs `build`) pulls this commit's SHA onto racknerd2,
restarts the stack, and health-checks `/api/health`. Fully hands-off.
- **Manual build**: run **Build & Push Images** from the Actions tab (also
triggers the auto-deploy job).
- **Manual deploy / rollback**: run **Deploy to racknerd2**, entering any tag
(`latest` or a specific commit SHA) to deploy without rebuilding.
## Notes / ceilings
- Auto-deploy targets racknerd2 (the validation host) on every push to `main`,
pinned to the built commit's SHA. If you later add a prod host, gate
prod deploys behind a manual approval or a tag/release trigger rather than
every push.
- Single-arch (amd64) only — both the runner host and racknerd2 are amd64, so
no buildx/multi-platform is needed.

54
deploy/docker-compose.yml Normal file
View file

@ -0,0 +1,54 @@
# Deploy compose for racknerd2 (validation host).
#
# Unlike the root docker-compose.yml (which BUILDS images locally), this file
# PULLS pre-built images from the Forgejo container registry
# (registry.snsnetlabs.com/sam/...) that the Forgejo Actions `build` workflow
# pushes. racknerd2 only has ~1.9 GiB RAM, so we never build here.
#
# Usage on racknerd2 (in this file's directory, with a sibling .env):
# docker login registry.snsnetlabs.com # once, as user `sam`
# docker compose pull && docker compose up -d
#
# IMPORTANT: published ports are bound to the NetBird mesh IP only. Docker
# manipulates iptables directly and BYPASSES ufw, so a plain "8080:8080" would
# expose the port on the host's public interface regardless of the firewall.
# Binding to ${ARCHNEST_BIND_IP} keeps the app reachable only over the mesh.
services:
archnest:
image: registry.snsnetlabs.com/sam/archnest:${ARCHNEST_TAG:-latest}
container_name: archnest
restart: unless-stopped
ports:
- "${ARCHNEST_BIND_IP:-100.96.217.250}:8080:8080"
depends_on:
- archnest-backend
archnest-backend:
image: registry.snsnetlabs.com/sam/archnest-backend:${ARCHNEST_TAG:-latest}
container_name: archnest-backend
restart: unless-stopped
environment:
- PORT=4000
- ARCHNEST_DB_PATH=/data/archnest.db
- ARCHNEST_JWT_SECRET=${ARCHNEST_JWT_SECRET}
- ARCHNEST_SECRET_KEY=${ARCHNEST_SECRET_KEY}
- ARCHNEST_CORS_ORIGIN=${ARCHNEST_CORS_ORIGIN:-http://100.96.217.250:8080}
- ARCHNEST_GUAC_CRYPT_KEY=${ARCHNEST_GUAC_CRYPT_KEY}
- ARCHNEST_GUACD_HOST=guacd
- ARCHNEST_GUACD_PORT=4822
volumes:
- archnest-data:/data
# No host port published: the frontend container reaches the backend over
# the compose network as "archnest-backend:4000" (nginx proxies /api).
depends_on:
- guacd
guacd:
image: guacamole/guacd:1.5.5
container_name: archnest-guacd
restart: unless-stopped
# Internal only; reachable as "guacd:4822" on the compose network.
volumes:
archnest-data:

View file

@ -35,6 +35,8 @@
which must be kept in sync or the layout clips/gaps. which must be kept in sync or the layout clips/gaps.
### Colors ### Colors
#### Dark Mode (Default — Shipped)
| Role | Value | | Role | Value |
|------|-------| |------|-------|
| Background (page) | `#0D0E10` | | Background (page) | `#0D0E10` |
@ -48,6 +50,72 @@
| Text (primary) | `#E8E6E0` | | Text (primary) | `#E8E6E0` |
| Text (secondary) | `#7A7D85` | | Text (secondary) | `#7A7D85` |
#### Light Mode (Planned — Palette Documented)
| Role | Value |
|------|-------|
| Background (page) | `#F7F6F3` (warm off-white) |
| Background (cards) | `#FFFFFF` (pure white) |
| Background (sidebar) | `#FAF9F7` (very light warm gray) |
| Border (cards) | `#E8E5DF` (soft warm border) |
| Border/accent (hover/active) | `#C8A434` (gold — same as dark) |
| Success | `#2ECC71` |
| Warning | `#E67E22` |
| Danger | `#E74C3C` |
| Text (primary) | `#1A1A1A` (near-black) |
| Text (secondary) | `#6B6560` (warm gray) |
Hero banners per mode:
- Dark: `assets/themes/archnest-default/archnest-default-dark.png` — dark sci-fi cityscape with neon arches
- Light: `assets/themes/archnest-default/archnest-default-light.png` — luminous white/gold cityscape, bright sky
Geometric/card backgrounds per mode:
- Dark: textured black slate with angular gold-lit geometric cuts (top-left and bottom-right diagonal slashes with warm gold edge lighting). Dominant colors: near-black slate `#1A1815`, charcoal `#0F0E0C`, gold edge glow `#C8A434``#8B6914`.
- Light: cream marble with matching diagonal geometric cuts and warm gold edge lighting. Dominant colors: warm cream `#F0EDE6`, gold highlights `#C8A434`.
---
### Forest Theme
A second theme with dark and light modes. Same structural layout as the default
but with a different visual identity — mountainous alien landscapes with amber/gold
point lights and a massive planet in the sky.
#### Forest — Dark Mode
| Role | Value | Notes |
|------|-------|-------|
| Page background | `#080806` | Near-black with warm brown undertone |
| Card background | `#121210` | Dark charcoal-brown |
| Sidebar background | `#0A0A08` | Deepest surface |
| Border | `#1E1C18` | Warm dark border |
| Accent | `#D4A850` | Warm amber/gold (slightly warmer than default) |
| Success | `#2ECC71` | |
| Warning | `#E67E22` | |
| Danger | `#E74C3C` | |
| Text primary | `#E8E4DC` | Warm off-white |
| Text secondary | `#7A7568` | Warm gray-brown |
Hero banner: dark alien mountain landscape with massive planet, amber point lights on a grid floor, warm gold highlights on peaks. Deep blacks with scattered amber/gold sparks.
Asset: `assets/themes/forest/forest-dark.png`
#### Forest — Light Mode
| Role | Value | Notes |
|------|-------|-------|
| Page background | `#F5F2ED` | Warm ivory |
| Card background | `#FFFFFF` | Pure white |
| Sidebar background | `#FAF8F4` | Lightest warm tone |
| Border | `#E5E0D8` | Soft warm border |
| Accent | `#D4A850` | Same amber/gold as dark mode |
| Success | `#2ECC71` | |
| Warning | `#E67E22` | |
| Danger | `#E74C3C` | |
| Text primary | `#1A1810` | Warm near-black |
| Text secondary | `#6B6558` | Warm brown-gray |
Hero banner: luminous white/ivory mountain landscape with massive planet, golden sparkle points on a marble-like floor, peaks dusted in white with gold vein highlights. Ethereal, bright, airy.
Asset: `assets/themes/forest/forest-light.png`
Tailwind v4 `@theme` custom colors (`text-gold`, `bg-card`, etc.) don't always Tailwind v4 `@theme` custom colors (`text-gold`, `bg-card`, etc.) don't always
apply reliably — fall back to inline `style={{ color: '#C8A434' }}` when a apply reliably — fall back to inline `style={{ color: '#C8A434' }}` when a
color isn't rendering, and verify visually after changes. color isn't rendering, and verify visually after changes.

View file

@ -63,8 +63,8 @@ internal working notes that don't belong in a public project:
| `docs/rdp-debug-handoff.md` | Contains lab creds (`sam` / `happy2026`) + private VM IP `192.168.122.55` + personal host names | **Exclude** (or heavily genericize into a "Remote Desktop setup" guide with no creds/IPs) | | `docs/rdp-debug-handoff.md` | Contains lab creds (`sam` / `happy2026`) + private VM IP `192.168.122.55` + personal host names | **Exclude** (or heavily genericize into a "Remote Desktop setup" guide with no creds/IPs) |
| `HANDOFF.md` | Internal session-to-session working notes | **Exclude** | | `HANDOFF.md` | Internal session-to-session working notes | **Exclude** |
| `docs/OPEN-SOURCE-RELEASE.md` (this file) | Internal release plan | **Exclude** | | `docs/OPEN-SOURCE-RELEASE.md` (this file) | Internal release plan | **Exclude** |
| `archnest.snsnetlabs.com` references in `.env.example`, `docker-compose.yml`, `.github/workflows/deploy.yml` | Personal domain/deploy target | **Genericize** to `example.com` / `localhost`; the deploy workflow should be removed or replaced with a generic CI (build + lint only, no SCP-to-my-server) | | `archnest.snsnetlabs.com` references in `.env.example`, `docker-compose.yml` | Personal domain/deploy target | **Genericize** to `example.com` / `localhost` |
| `.github/workflows/deploy.yml` | SSHes/SCPs to the personal `racknerd1` server | **Remove**; replace with a generic build/test CI workflow | | Forgejo CI (`.forgejo/workflows/`) | Already build/validate only (no SCP/personal server). The build workflow pushes to a private registry + deploys to a private host | **Keep but genericize** the registry host + deploy job, or strip the deploy job for a public build-only CI |
| `agent/` deploy specifics | Fine to include the agent script, but scrub any host-specific URLs/tokens in its README | **Review + genericize** | | `agent/` deploy specifics | Fine to include the agent script, but scrub any host-specific URLs/tokens in its README | **Review + genericize** |
| `assets/` personal background images | Large PNGs; keep the ones the UI needs (hero banner, logo, KPI backgrounds), drop unused experiments (`opt1.bg`, `settings-custom-bg`, `pics/`) | **Trim to what's referenced** | | `assets/` personal background images | Large PNGs; keep the ones the UI needs (hero banner, logo, KPI backgrounds), drop unused experiments (`opt1.bg`, `settings-custom-bg`, `pics/`) | **Trim to what's referenced** |
| Test/scratch files | `backend/data/`, any `*.db`, session logs | Already gitignored — confirm none are force-added | | Test/scratch files | `backend/data/`, any `*.db`, session logs | Already gitignored — confirm none are force-added |
@ -115,7 +115,7 @@ LICENSE, README.md, CONTRIBUTING.md, screenshots/ # new, written for OSS
HANDOFF.md HANDOFF.md
docs/rdp-debug-handoff.md docs/rdp-debug-handoff.md
docs/OPEN-SOURCE-RELEASE.md (this file) docs/OPEN-SOURCE-RELEASE.md (this file)
.github/workflows/deploy.yml (replace with generic CI) .forgejo/workflows/ (genericize: strip registry host + deploy job, or build-only CI)
backend/data/, *.db, session logs, *.tsbuildinfo backend/data/, *.db, session logs, *.tsbuildinfo
unused assets/ experiments + pics/ unused assets/ experiments + pics/
``` ```
@ -212,7 +212,7 @@ Notes for credibility:
- [ ] Create fresh public repo, copy INCLUDE list, exclude EXCLUDE list. - [ ] Create fresh public repo, copy INCLUDE list, exclude EXCLUDE list.
- [ ] Genericize personal domain → `example.com`/`localhost` in - [ ] Genericize personal domain → `example.com`/`localhost` in
`.env.example`, `docker-compose.yml`. `.env.example`, `docker-compose.yml`.
- [ ] Replace `.github/workflows/deploy.yml` with a generic build/lint CI (no SCP). - [ ] Genericize `.forgejo/workflows/` for public use (strip the private registry host + the racknerd2 deploy job, or ship build-only CI).
- [ ] Add `LICENSE` (MIT), public `README.md`, `CONTRIBUTING.md`. - [ ] Add `LICENSE` (MIT), public `README.md`, `CONTRIBUTING.md`.
- [ ] Capture + add screenshots (sanitized data, dark theme). - [ ] Capture + add screenshots (sanitized data, dark theme).
- [ ] Re-run a secret scan on the NEW repo before first push - [ ] Re-run a secret scan on the NEW repo before first push

Binary file not shown.

After

Width:  |  Height:  |  Size: 257 KiB

View file

@ -0,0 +1,73 @@
from diagrams import Diagram, Cluster, Edge
from diagrams.aws.security import Cognito, SecretsManager
from diagrams.aws.storage import S3
from diagrams.aws.network import Route53
from diagrams.aws.compute import Lambda
from diagrams.aws.engagement import SES
from diagrams.onprem.container import Docker
from diagrams.onprem.compute import Server
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.network import Nginx
from diagrams.onprem.client import User
from diagrams.generic.storage import Storage
with Diagram("ArchNest SaaS - Hybrid Architecture", show=False, filename="/tmp/archnest-hybrid", direction="TB", outformat="png"):
users = User("Tenants")
with Cluster("Akamai Cloud"):
lb = Nginx("NodeBalancer\nHTTPS/WSS")
with Cluster("G7 Dedicated (4GB, 2 vCPU, ARM)"):
backend = Server("Fastify\nBackend API")
websocket = Server("Fastify\nWebSocket Service")
guacd = Docker("guacd\n(RDP/VNC)")
with Cluster("Data (Self-Managed)"):
postgres = PostgreSQL("PostgreSQL\n(RLS Enabled)")
redis = Redis("Redis\n(Sessions/Cache)")
static = Storage("Object Storage\n(React SPA)")
with Cluster("AWS (Managed Services Only)"):
cognito = Cognito("Cognito\nUser Pools + SSO")
pre_token = Lambda("Pre-Token\nLambda")
secrets = SecretsManager("Secrets Manager\nSSH Keys")
s3 = S3("S3\nBackups + Logs")
route53 = Route53("Route 53")
ses = SES("SES\nEmail")
stripe_lambda = Lambda("Stripe\nWebhook Lambda")
with Cluster("Tenant Infrastructure"):
host1 = Server("SSH Host A")
host2 = Server("SSH Host B")
docker_host = Docker("Docker Host")
# User flow
users >> route53 >> lb
lb >> static
lb >> backend
lb >> websocket
# Backend connections
backend >> postgres
backend >> redis
backend >> secrets
backend >> s3
websocket >> redis
websocket >> guacd
# Auth
cognito >> pre_token
backend >> cognito
stripe_lambda >> cognito
# Outbound to tenant hosts (direct, no NAT needed)
backend >> host1
backend >> host2
websocket >> host1
websocket >> docker_host
# Email
backend >> ses

View file

@ -0,0 +1,419 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ArchNest — Product Design Review</title>
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; background: #0D0E10; color: #E8E6E0; line-height: 1.6; }
.container { max-width: 1400px; margin: 0 auto; padding: 40px 60px 120px; }
h1 { font-size: 32px; color: #C8A434; font-weight: 700; margin-bottom: 8px; letter-spacing: 1px; text-transform: uppercase; }
h2 { font-size: 22px; color: #C8A434; font-weight: 600; margin: 48px 0 16px; padding-bottom: 8px; border-bottom: 1px solid #1E2025; }
h3 { font-size: 16px; color: #E8E6E0; font-weight: 600; margin: 24px 0 12px; }
p { margin: 12px 0; color: #E8E6E0; font-size: 14px; }
.subtitle { color: #7A7D85; font-size: 14px; margin-bottom: 32px; }
.card { background: #141518; border: 1px solid #1E2025; border-radius: 12px; padding: 24px; margin: 16px 0; }
.card:hover { border-color: #C8A434; transition: border-color 0.2s ease; }
.card-title { font-size: 11px; text-transform: uppercase; letter-spacing: 1.5px; color: #7A7D85; margin-bottom: 12px; font-weight: 500; }
table { width: 100%; border-collapse: collapse; margin: 16px 0; font-size: 13px; }
th { background: #141518; color: #C8A434; text-align: left; padding: 12px 16px; border: 1px solid #1E2025; font-size: 11px; text-transform: uppercase; letter-spacing: 1px; }
td { padding: 10px 16px; border: 1px solid #1E2025; color: #E8E6E0; }
tr:hover td { background: #1a1b1f; }
code { background: #1a1b1f; color: #C8A434; padding: 2px 6px; border-radius: 4px; font-size: 13px; font-family: 'JetBrains Mono', monospace; }
.grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 16px; margin: 16px 0; }
.badge { display: inline-block; padding: 3px 10px; border-radius: 12px; font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px; }
.badge-green { background: rgba(46,204,113,0.15); color: #2ECC71; }
.badge-gold { background: rgba(200,164,52,0.15); color: #C8A434; }
.badge-blue { background: rgba(59,130,246,0.15); color: #3B82F6; }
.mermaid { background: #141518; border-radius: 12px; padding: 24px; margin: 24px 0; border: 1px solid #1E2025; }
.feature-list { list-style: none; padding: 0; }
.feature-list li { padding: 8px 0; border-bottom: 1px solid #1E2025; font-size: 14px; }
.feature-list li:last-child { border-bottom: none; }
.feature-list li::before { content: "\2192"; color: #C8A434; margin-right: 10px; }
.section-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 24px; }
@media (max-width: 900px) { .section-grid { grid-template-columns: 1fr; } }
.cost-total { font-size: 28px; font-weight: 700; color: #C8A434; }
.module-price { font-size: 20px; font-weight: 700; color: #2ECC71; }
.theme-swatch { display: inline-block; width: 24px; height: 24px; border-radius: 6px; margin-right: 6px; vertical-align: middle; border: 1px solid #1E2025; }
.approval-bar { position: sticky; bottom: 0; background: #141518; border-top: 1px solid #C8A434; padding: 16px 60px; display: flex; justify-content: space-between; align-items: center; z-index: 100; }
.btn { padding: 10px 24px; border-radius: 8px; font-size: 14px; font-weight: 600; cursor: pointer; border: none; transition: all 0.2s; }
.btn-approve { background: #C8A434; color: #0D0E10; }
.btn-approve:hover { background: #dab944; }
.btn-reject { background: transparent; color: #E74C3C; border: 1px solid #E74C3C; }
.btn-reject:hover { background: rgba(231,76,60,0.1); }
.hero { background: linear-gradient(135deg, #141518 0%, #0D0E10 50%, #1a1510 100%); border-radius: 16px; padding: 48px; margin-bottom: 32px; border: 1px solid #1E2025; }
.lock-icon { color: #7A7D85; margin-right: 6px; }
.free-badge { background: rgba(46,204,113,0.15); color: #2ECC71; padding: 2px 8px; border-radius: 8px; font-size: 11px; font-weight: 600; margin-left: 8px; }
.paid-badge { background: rgba(200,164,52,0.15); color: #C8A434; padding: 2px 8px; border-radius: 8px; font-size: 11px; font-weight: 600; margin-left: 8px; }
</style>
</head>
<body>
<div class="container">
<div class="hero">
<h1>ArchNest</h1>
<p class="subtitle">Self-Hosted Product Design — Open Core + Paid Modules</p>
<p>Free self-hosted ops dashboard. Unlock features with $5 one-time module purchases. Own it forever. No subscriptions.</p>
<div style="margin-top: 16px;">
<span class="badge badge-green">Free Core</span>
<span class="badge badge-gold">$5/Module</span>
<span class="badge badge-blue">Self-Hosted</span>
</div>
</div>
<h2>Business Model</h2>
<div class="section-grid">
<div class="card">
<div class="card-title">How It Works</div>
<ul class="feature-list">
<li>Free core — genuinely useful self-hosted dashboard</li>
<li>$5 one-time purchase per module (30 modules available)</li>
<li>Bundles at discount ($10-$99)</li>
<li>Free core updates forever</li>
<li>Customer owns it — no vendor lock-in</li>
</ul>
</div>
<div class="card">
<div class="card-title">Your Economics</div>
<ul class="feature-list">
<li>Infrastructure cost: ~$1/month (license server)</li>
<li>Profit margin: 95%+ per sale</li>
<li>Zero churn (one-time, not subscription)</li>
<li>Zero hosting cost per customer</li>
<li>Net per $5 module (after Stripe): $4.55</li>
</ul>
</div>
</div>
<h2>Free Core</h2>
<div class="card">
<div class="card-title">Ships Free — No Purchase Required</div>
<table>
<tr><th>Feature</th><th>Free Limit</th></tr>
<tr><td>Dashboard (Glance)</td><td>Full</td></tr>
<tr><td>Infrastructure Overview</td><td>Full</td></tr>
<tr><td>SSH Terminal</td><td>1 tab, 1 pane</td></tr>
<tr><td>SSH Tunnels</td><td>Manual start only</td></tr>
<tr><td>SFTP File Manager</td><td>Full</td></tr>
<tr><td>Docker Management</td><td>TCP API only, 1 source</td></tr>
<tr><td>Host Metrics</td><td>Basic (CPU/memory/disk)</td></tr>
<tr><td>Bookmarks</td><td>10 max</td></tr>
<tr><td>SSH Hosts</td><td>3 max</td></tr>
<tr><td>Users</td><td>1 (admin only)</td></tr>
<tr><td>Theme</td><td>ArchNest Dark only</td></tr>
<tr><td>Help Page</td><td>Full</td></tr>
</table>
</div>
<h2>Paid Modules — $5 Each</h2>
<h3>SSH Modules (8)</h3>
<div class="grid">
<div class="card">
<div class="card-title">1. Multi-Pane Terminal <span class="paid-badge">$5</span></div>
<p>Split panes (2/4), multiple tabs</p>
</div>
<div class="card">
<div class="card-title">2. tmux Integration <span class="paid-badge">$5</span></div>
<p>Attach to existing tmux sessions</p>
</div>
<div class="card">
<div class="card-title">3. Jump-Host Chaining <span class="paid-badge">$5</span></div>
<p>Connect through intermediary hosts</p>
</div>
<div class="card">
<div class="card-title">4. Certificate Auth <span class="paid-badge">$5</span></div>
<p>OPKSSH certificate-based SSH auth</p>
</div>
<div class="card">
<div class="card-title">5. Tunnel Auto-Start <span class="paid-badge">$5</span></div>
<p>Tunnels start automatically on boot</p>
</div>
<div class="card">
<div class="card-title">6. Persistent Sessions <span class="paid-badge">$5</span></div>
<p>Terminal sessions survive navigation</p>
</div>
<div class="card">
<div class="card-title">7. Session Recording <span class="paid-badge">$5</span></div>
<p>Record terminal sessions to disk</p>
</div>
<div class="card">
<div class="card-title">8. Host-to-Host Transfer <span class="paid-badge">$5</span></div>
<p>Copy/move files between SSH hosts</p>
</div>
</div>
<h3>Docker Modules (4)</h3>
<div class="grid">
<div class="card">
<div class="card-title">9. Docker over SSH <span class="paid-badge">$5</span></div>
<p>Manage containers via CLI over SSH</p>
</div>
<div class="card">
<div class="card-title">10. Docker Push Agent <span class="paid-badge">$5</span></div>
<p>Outbound-only monitoring agent</p>
</div>
<div class="card">
<div class="card-title">11. Container Exec <span class="paid-badge">$5</span></div>
<p>Interactive shell into containers</p>
</div>
<div class="card">
<div class="card-title">12. Container Details <span class="paid-badge">$5</span></div>
<p>Full inspect: ports, networks, env, mounts</p>
</div>
</div>
<h3>Integration Modules (6)</h3>
<div class="grid">
<div class="card">
<div class="card-title">13. Unlimited SSH Hosts <span class="paid-badge">$5</span></div>
<p>Remove 3-host cap</p>
</div>
<div class="card">
<div class="card-title">14. Proxmox <span class="paid-badge">$5</span></div>
<p>VM/LXC management</p>
</div>
<div class="card">
<div class="card-title">15. AWS <span class="paid-badge">$5</span></div>
<p>EC2 + STS resource inventory</p>
</div>
<div class="card">
<div class="card-title">16. Cloudflare <span class="paid-badge">$5</span></div>
<p>DNS zones, resource listing</p>
</div>
<div class="card">
<div class="card-title">17. NetBird <span class="paid-badge">$5</span></div>
<p>Mesh peers, connectivity</p>
</div>
<div class="card">
<div class="card-title">18. Uptime Kuma <span class="paid-badge">$5</span></div>
<p>Monitor status/health</p>
</div>
</div>
<h3>Desktop & Theme Modules (6)</h3>
<div class="grid">
<div class="card">
<div class="card-title">19. Remote Desktop: RDP <span class="paid-badge">$5</span></div>
<p>Windows RDP via Guacamole</p>
</div>
<div class="card">
<div class="card-title">20. Remote Desktop: VNC <span class="paid-badge">$5</span></div>
<p>VNC sessions via Guacamole</p>
</div>
<div class="card">
<div class="card-title">21. Remote Desktop: Telnet <span class="paid-badge">$5</span></div>
<p>Telnet sessions via Guacamole</p>
</div>
<div class="card">
<div class="card-title">22. Theme: Midnight Blue <span class="paid-badge">$5</span></div>
<div style="margin:4px 0;"><span class="theme-swatch" style="background:#0B0F1A;"></span><span class="theme-swatch" style="background:#3B82F6;"></span></div>
</div>
<div class="card">
<div class="card-title">23. Theme: Forest <span class="paid-badge">$5</span></div>
<div style="margin:4px 0;"><span class="theme-swatch" style="background:#0A120E;"></span><span class="theme-swatch" style="background:#10B981;"></span></div>
</div>
<div class="card">
<div class="card-title">24. Theme: Light <span class="paid-badge">$5</span></div>
<div style="margin:4px 0;"><span class="theme-swatch" style="background:#F5F5F5;"></span><span class="theme-swatch" style="background:#C8A434;"></span></div>
</div>
</div>
<h3>Platform Modules (6)</h3>
<div class="grid">
<div class="card">
<div class="card-title">25. Multi-User <span class="paid-badge">$5</span></div>
<p>Admin/member roles, up to 10 seats</p>
</div>
<div class="card">
<div class="card-title">26. Advanced Metrics <span class="paid-badge">$5</span></div>
<p>Network, processes, ports, firewall, login stats</p>
</div>
<div class="card">
<div class="card-title">27. Data Export/Import <span class="paid-badge">$5</span></div>
<p>Backup/restore full config as JSON</p>
</div>
<div class="card">
<div class="card-title">28. Audit Log <span class="paid-badge">$5</span></div>
<p>Full activity log with export</p>
</div>
<div class="card">
<div class="card-title">29. Unlimited Bookmarks <span class="paid-badge">$5</span></div>
<p>Remove 10-bookmark cap</p>
</div>
<div class="card">
<div class="card-title">30. Global Search <span class="paid-badge">$5</span></div>
<p>Search pages, integrations, bookmarks</p>
</div>
</div>
<h2>Bundles</h2>
<div class="grid" style="grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));">
<div class="card" style="text-align:center;">
<div class="card-title">SSH Pro</div>
<p class="module-price">$25</p>
<p style="color:#7A7D85;font-size:12px;">All 8 SSH modules (save $15)</p>
</div>
<div class="card" style="text-align:center;">
<div class="card-title">Docker Pro</div>
<p class="module-price">$15</p>
<p style="color:#7A7D85;font-size:12px;">All 4 Docker modules (save $5)</p>
</div>
<div class="card" style="text-align:center;">
<div class="card-title">Remote Desktop</div>
<p class="module-price">$10</p>
<p style="color:#7A7D85;font-size:12px;">RDP + VNC + Telnet (save $5)</p>
</div>
<div class="card" style="text-align:center;">
<div class="card-title">All Themes</div>
<p class="module-price">$10</p>
<p style="color:#7A7D85;font-size:12px;">3 extra themes (save $5)</p>
</div>
<div class="card" style="text-align:center;border-color:#C8A434;">
<div class="card-title" style="color:#C8A434;">Everything</div>
<p class="cost-total">$99</p>
<p style="color:#7A7D85;font-size:12px;">All 30 modules forever (save $51)</p>
</div>
</div>
<h2>License System</h2>
<div class="mermaid">
graph LR
BOOT[ArchNest Boot] --> CHECK[License Check<br/>HTTPS to license server]
CHECK --> RESP[Signed Response<br/>modules + valid_until]
RESP --> VALIDATE[Validate Ed25519<br/>signature locally]
VALIDATE --> UNLOCK[Unlock purchased<br/>modules]
UNLOCK --> WEEKLY[Re-check weekly]
WEEKLY --> CHECK
</div>
<div class="section-grid">
<div class="card">
<div class="card-title">How It Works</div>
<ul class="feature-list">
<li>Phone-home on boot + once weekly</li>
<li>Returns signed JSON: modules[] + valid_until (7 days)</li>
<li>Ed25519 signature validated locally (public key in code)</li>
<li>Works offline for 7 days between checks</li>
<li>After 7 days offline → falls back to free core</li>
</ul>
</div>
<div class="card">
<div class="card-title">License Server Stack</div>
<ul class="feature-list">
<li>Cloudflare Workers (free tier: 100K req/day)</li>
<li>Cloudflare D1 database (free tier: 5GB)</li>
<li>Stripe for payments</li>
<li>Total cost: ~$1/month + Stripe fees</li>
<li>Net per module sale: $4.55 (after Stripe)</li>
</ul>
</div>
</div>
<h2>Purchase Flow</h2>
<div class="mermaid">
graph LR
BROWSE[Browse Module Store<br/>in Settings] --> BUY[Click Buy → $5]
BUY --> STRIPE[Stripe Checkout]
STRIPE --> WEBHOOK[Webhook → License Server]
WEBHOOK --> RECORD[Record purchase<br/>in D1 database]
RECORD --> POLL[Next license check<br/>returns new module]
POLL --> ACTIVE[Feature unlocks]
</div>
<h2>Revenue Projections</h2>
<div class="card">
<table>
<tr><th>Stage</th><th>Installs/mo</th><th>Avg Modules Bought</th><th>Revenue/mo</th></tr>
<tr><td>Early (month 1-3)</td><td>50</td><td>3 modules ($15)</td><td>$750</td></tr>
<tr><td>Growth (month 4-6)</td><td>200</td><td>4 modules ($20)</td><td>$4,000</td></tr>
<tr><td>Steady (month 7-12)</td><td>500</td><td>5 modules ($25)</td><td>$12,500</td></tr>
<tr><td>Mature (year 2)</td><td>1,000</td><td>$30 avg (bundles)</td><td>$30,000</td></tr>
</table>
<p style="margin-top:16px;color:#7A7D85;">Infrastructure cost stays at ~$1/month regardless of scale. 95%+ margin at all stages.</p>
</div>
<h2>What Changes From Current Code</h2>
<div class="card">
<table>
<tr><th>Area</th><th>Current</th><th>New</th></tr>
<tr><td>Database</td><td>SQLite</td><td>SQLite (stays)</td></tr>
<tr><td>Auth</td><td>Local JWT</td><td>Local JWT (stays)</td></tr>
<tr><td>Multi-tenant</td><td>N/A</td><td>Not needed (single-tenant per install)</td></tr>
<tr><td>License</td><td>None</td><td>Weekly phone-home + signature validation</td></tr>
<tr><td>Module gating</td><td>None</td><td>Fastify middleware + frontend lock UI</td></tr>
<tr><td>Settings</td><td>Current tabs</td><td>+ "Module Store" tab</td></tr>
<tr><td>Stripe</td><td>None</td><td>Checkout for purchases</td></tr>
</table>
<p style="margin-top:16px;"><strong>Key insight:</strong> Almost no infrastructure changes. You're adding a license layer and a store UI — not rewriting anything.</p>
</div>
<h2>Implementation Phases</h2>
<div class="grid">
<div class="card">
<div class="card-title">Phase 1 — License Infrastructure</div>
<ul class="feature-list">
<li>Build license server (CF Workers + D1)</li>
<li>Add license check to backend</li>
<li>Add module enforcement middleware</li>
<li>Add "Module Store" tab in Settings</li>
</ul>
</div>
<div class="card">
<div class="card-title">Phase 2 — Module Gating</div>
<ul class="feature-list">
<li>Define module boundaries in routes</li>
<li>Add lock UI to gated features</li>
<li>Free tier caps (3 hosts, 1 pane, 10 bookmarks)</li>
</ul>
</div>
<div class="card">
<div class="card-title">Phase 3 — Purchase Flow</div>
<ul class="feature-list">
<li>Stripe Checkout integration</li>
<li>Module activation on webhook</li>
<li>Bundle discounts</li>
<li>Purchase history in Settings</li>
</ul>
</div>
<div class="card">
<div class="card-title">Phase 4 — Distribution</div>
<ul class="feature-list">
<li>Public Docker image</li>
<li>Landing page + module catalog</li>
<li>Installation docs</li>
<li>Demo instance</li>
</ul>
</div>
</div>
<h2>Open Decisions</h2>
<table>
<tr><th>#</th><th>Question</th><th>Options</th></tr>
<tr><td>1</td><td>Source code visibility</td><td>Open-source (MIT) vs source-available (BSL) vs proprietary</td></tr>
<tr><td>2</td><td>Distribution</td><td>Docker Hub vs GitHub Container Registry</td></tr>
<tr><td>3</td><td>Landing page</td><td>Cloudflare Pages vs separate repo</td></tr>
<tr><td>4</td><td>Refund policy</td><td>30-day vs no refunds ($5 is low)</td></tr>
<tr><td>5</td><td>Module store UX</td><td>In-app tab vs external website</td></tr>
<tr><td>6</td><td>License transfer</td><td>Unlimited vs 1/year</td></tr>
</table>
</div>
<div class="approval-bar">
<div>
<strong style="color: #C8A434;">Product Design Review</strong>
<span style="color: #7A7D85; margin-left: 12px;">ArchNest — Self-Hosted + $5 Modules</span>
</div>
<div>
<button class="btn btn-reject" onclick="alert('Tell Kiro what to change.')">Request Changes</button>
<button class="btn btn-approve" style="margin-left: 12px;" onclick="alert('Approved! Ready to build the license system.')">Approve Design</button>
</div>
</div>
<script>
mermaid.initialize({ theme: 'dark', themeVariables: { primaryColor: '#C8A434', primaryTextColor: '#E8E6E0', primaryBorderColor: '#1E2025', lineColor: '#7A7D85', secondaryColor: '#141518', tertiaryColor: '#0D0E10', background: '#141518', mainBkg: '#141518', nodeBorder: '#C8A434' }});
</script>
</body>
</html>

View file

@ -0,0 +1,326 @@
# ArchNest — Self-Hosted Product Design
> Open-core model: free self-hosted base with $5 one-time module purchases.
> No subscriptions. No SaaS. Customer owns it forever.
---
## Business Model
| Aspect | Detail |
|--------|--------|
| **Core** | Free, self-hosted, open-source (or source-available) |
| **Modules** | $5 one-time purchase each (lifetime license) |
| **Updates** | Free core updates forever. Module updates included. |
| **License** | Phone-home on boot + weekly check. Works offline between checks. |
| **Revenue** | Volume × $5. Target: high module attach rate per install. |
| **Infrastructure cost** | Near zero (license server + payment processor only) |
---
## Free Core (What Ships for Free)
The free tier must be genuinely useful — good enough to adopt, limited enough
to want more.
| Feature | Included Free |
|---------|--------------|
| Dashboard (Glance page) | ✓ |
| Infrastructure overview | ✓ |
| SSH Terminal (1 tab, 1 pane) | ✓ |
| SSH Tunnels (manual start only) | ✓ |
| SFTP File Manager | ✓ |
| Docker management (TCP API only, 1 source) | ✓ |
| Host Metrics (basic: CPU/memory/disk) | ✓ |
| Bookmarks (10 max) | ✓ |
| Settings (Profile, Integrations) | ✓ |
| 3 SSH host integrations max | ✓ |
| 1 user (admin only) | ✓ |
| Single theme (ArchNest Dark) | ✓ |
| Help page | ✓ |
**Why this works:** A solo developer with 13 servers can use ArchNest for
free with a functional terminal, basic Docker visibility, and file management.
The moment they want split panes, more hosts, multi-user, or RDP — they buy
modules.
---
## Paid Modules ($5 Each)
### SSH Modules
| # | Module | What It Unlocks |
|---|--------|-----------------|
| 1 | **Multi-Pane Terminal** | Split panes (2/4), multiple tabs |
| 2 | **tmux Integration** | Attach to existing tmux sessions |
| 3 | **Jump-Host Chaining** | Connect through intermediary hosts (ProxyJump) |
| 4 | **Certificate Auth (OPKSSH)** | Certificate-based SSH authentication |
| 5 | **Tunnel Auto-Start** | Tunnels start automatically on boot |
| 6 | **Persistent Sessions** | Terminal sessions survive page navigation |
| 7 | **Session Recording** | Record terminal sessions to disk |
| 8 | **Host-to-Host Transfer** | Copy/move files between two SSH hosts |
### Docker Modules
| # | Module | What It Unlocks |
|---|--------|-----------------|
| 9 | **Docker over SSH** | Manage containers via `docker` CLI over SSH (no exposed socket) |
| 10 | **Docker Push Agent** | Outbound-only monitoring agent for Docker hosts |
| 11 | **Container Exec** | Interactive shell into running containers |
| 12 | **Container Detail View** | Full inspect: ports, networks, mounts, env, labels |
### Integration Modules
| # | Module | What It Unlocks |
|---|--------|-----------------|
| 13 | **Unlimited SSH Hosts** | Remove 3-host cap (unlimited integrations) |
| 14 | **Proxmox Integration** | VM/LXC management |
| 15 | **AWS Integration** | EC2 + STS resource inventory |
| 16 | **Cloudflare Integration** | DNS zones, resource listing |
| 17 | **NetBird Integration** | Mesh peers, connectivity |
| 18 | **Uptime Kuma Integration** | Monitor status/health |
### Desktop & Display Modules
| # | Module | What It Unlocks |
|---|--------|-----------------|
| 19 | **Remote Desktop (RDP)** | RDP sessions via Guacamole |
| 20 | **Remote Desktop (VNC)** | VNC sessions via Guacamole |
| 21 | **Remote Desktop (Telnet)** | Telnet sessions via Guacamole |
| 22 | **Theme: Midnight Blue** | Blue accent theme |
| 23 | **Theme: Forest** | Emerald accent theme |
| 24 | **Theme: Light** | Light mode theme |
### Platform Modules
| # | Module | What It Unlocks |
|---|--------|-----------------|
| 25 | **Multi-User** | Add users (admin/member roles, up to 10 seats) |
| 26 | **Advanced Metrics** | Full host metrics (network, processes, ports, firewall, login stats) |
| 27 | **Data Export/Import** | Backup/restore integrations + secrets + bookmarks + tunnels |
| 28 | **Audit Log** | Full activity audit log with export |
| 29 | **Unlimited Bookmarks** | Remove 10-bookmark cap |
| 30 | **Global Search** | Search across pages, integrations, bookmarks |
---
## Bundles (Discounted)
| Bundle | Modules Included | Price | Savings |
|--------|-----------------|-------|---------|
| **SSH Pro** | #18 (all SSH modules) | $25 | Save $15 |
| **Docker Pro** | #912 (all Docker modules) | $15 | Save $5 |
| **Remote Desktop** | #1921 (RDP + VNC + Telnet) | $10 | Save $5 |
| **All Themes** | #2224 (3 themes) | $10 | Save $5 |
| **Everything** | All 30 modules | $99 | Save $51 |
---
## Revenue Model
| Scenario | Installs/mo | Avg modules purchased | Revenue/mo |
|----------|-------------|----------------------|------------|
| Early (month 1-3) | 50 | 3 modules ($15 avg) | $750 |
| Growth (month 4-6) | 200 | 4 modules ($20 avg) | $4,000 |
| Steady (month 7-12) | 500 | 5 modules ($25 avg) | $12,500 |
| Mature (year 2) | 1,000 | 4 modules + bundles ($30 avg) | $30,000 |
**Infrastructure cost:** ~$20-30/month (license server + Stripe + domain).
**Profit margin:** ~95%+ (no SaaS hosting, no per-tenant compute).
---
## License System Architecture
### Phone-Home (Light Touch)
```
┌─────────────────────┐ ┌────────────────────────┐
│ Customer Install │ │ License Server │
│ │ │ (Akamai / Cloudflare │
│ Fastify Backend │────────▶│ Workers / Lambda) │
│ on boot + weekly │ │ │
│ │◀────────│ Returns: │
│ Validates signed │ │ - licensed_modules[] │
│ response locally │ │ - valid_until (7day) │
└─────────────────────┘ │ - signature │
└────────────────────────┘
```
**How it works:**
1. Customer installs ArchNest (Docker Compose or bare metal)
2. On first boot, backend calls license server with install ID
3. License server returns a signed JSON payload:
- `modules`: list of purchased module slugs
- `valid_until`: timestamp (7 days from now)
- `signature`: Ed25519 signature of the payload
4. Backend validates the signature locally (public key embedded in code)
5. If signature valid and `valid_until` hasn't expired → features unlocked
6. Re-checks weekly. If server unreachable, works offline for 7 days.
7. After 7 days without a successful check → falls back to free core only
**Grace period:** 7 days offline. Generous enough for server maintenance,
network issues, etc. If someone loses internet for a week, they keep working.
### License Server Stack
| Component | Provider | Cost |
|-----------|----------|------|
| License API | Cloudflare Workers (free tier: 100K req/day) | $0 |
| Database | Cloudflare D1 (free tier: 5GB) | $0 |
| Payment | Stripe (2.9% + $0.30 per transaction) | Per-sale |
| Domain | Route 53 or Cloudflare | $1/mo |
| **Total** | | **~$1/mo + Stripe fees** |
At $5/module, Stripe takes ~$0.45 per transaction. Net per module: **$4.55**.
### Purchase Flow
```
Customer browses modules in Settings → Module Store tab
→ Clicks "Buy" → Stripe Checkout ($5)
→ Stripe webhook → License server records purchase
→ Customer's next license check returns new module
→ Feature unlocks immediately (or within minutes on next poll)
```
### Install ID Generation
- Generated on first boot: `SHA-256(machine-id + secret-key + timestamp)`
- Stored in the database
- Tied to Stripe customer on first purchase
- Transferable (customer can request a reset if they move servers)
---
## Module Enforcement (Backend)
```typescript
// Fastify plugin — runs before route handlers
const tierMiddleware = (app) => {
app.addHook('onRequest', async (req, reply) => {
const license = app.licenseCache; // refreshed weekly
req.modules = license?.modules ?? [];
});
};
// Route-level check
app.get('/api/terminal/connect', {
preHandler: [requireModule('multi-pane-terminal')],
handler: terminalConnect
});
function requireModule(slug: string) {
return async (req, reply) => {
if (!req.modules.includes(slug)) {
reply.code(402).send({
error: 'Module required',
module: slug,
price: '$5',
purchaseUrl: `https://archnest.io/modules/${slug}`
});
}
};
}
```
**Frontend enforcement:**
- Module-gated UI elements show a lock icon + "Unlock for $5" prompt
- Clicking opens the purchase flow (in-app or redirect to store)
- After purchase, UI refreshes and feature unlocks
---
## Free Core Updates
- All users get bug fixes, security patches, and core feature improvements
- Module features don't get stripped from updates — once bought, always works
- New modules may be added over time (new revenue without churning existing customers)
- Major version upgrades (v2, v3) may require a new "Everything" bundle purchase (TBD)
---
## Comparison: SaaS vs Self-Hosted Module Model
| | SaaS (old design) | Self-Hosted Modules (new) |
|---|---|---|
| Infra cost | $66-300/mo | ~$1/mo |
| Revenue model | Recurring ($2.50-12/mo) | One-time ($5/module) |
| Churn risk | High (monthly cancel) | None (one-time) |
| Support burden | High (you host it) | Low (they host it) |
| Profit margin | 60-65% | 95%+ |
| Scale limit | Your AWS bill | Their hardware |
| Customer lock-in | Subscription | Ownership (better reputation) |
---
## Tech Stack (Unchanged)
| Layer | Tech |
|-------|------|
| Frontend | React 19, Vite 8, TypeScript, Tailwind v4 |
| Backend | Fastify 5, TypeScript, SQLite (better-sqlite3) |
| Auth | Local JWT + bcrypt (self-hosted, no Cognito) |
| License | Phone-home to Cloudflare Workers |
| Payment | Stripe Checkout |
| Deploy | Docker Compose (customer's hardware) |
| CI/CD | Forgejo Actions |
---
## What Changes From Current Codebase
| Area | Current | New |
|------|---------|-----|
| Database | SQLite (stays) | SQLite (stays — no Postgres migration needed) |
| Auth | Local JWT (stays) | Local JWT (stays — no Cognito needed) |
| Multi-tenant | Not needed | Not needed (single-tenant per install) |
| License check | None | New: weekly phone-home + local signature validation |
| Module gating | None | New: Fastify middleware + frontend lock UI |
| Settings page | Current tabs | New: "Module Store" tab |
| Stripe | None | New: Stripe Checkout for purchases |
**Key insight:** This model requires almost no infrastructure changes to the
current codebase. You're adding a license middleware layer and a store UI —
not rewriting the database, auth, or deployment.
---
## Implementation Priority
### Phase 1: License Infrastructure
1. Build license server (Cloudflare Workers + D1)
2. Add license check to backend (on boot + weekly cron)
3. Add module enforcement middleware
4. Add "Module Store" tab in Settings
### Phase 2: Module Gating
1. Define module boundaries in code (which routes require which module)
2. Add lock UI to gated features in frontend
3. Free tier caps (3 hosts, 1 pane, 10 bookmarks)
### Phase 3: Purchase Flow
1. Stripe integration (Checkout, webhooks)
2. Module activation on purchase
3. Bundle discounts
4. Purchase history in Settings
### Phase 4: Distribution
1. Public Docker image on Docker Hub / GitHub Container Registry
2. Landing page with module catalog
3. Installation docs
4. Demo instance for prospects
---
## Open Decisions
| # | Question | Options |
|---|----------|---------|
| 1 | Source code visibility | Open-source (MIT/Apache) vs source-available (BSL) vs proprietary |
| 2 | Docker Hub vs self-hosted registry | Docker Hub (wider reach) vs GHCR (free private) |
| 3 | Landing page tech | Static site on Cloudflare Pages vs separate repo |
| 4 | Refund policy | 30-day no-questions vs no refunds ($5 is low enough) |
| 5 | Module store UX | In-app tab vs external website |
| 6 | License transfer | Allow unlimited vs 1 transfer per year |

View file

@ -0,0 +1,173 @@
AWSTemplateFormatVersion: '2010-09-09'
Description: >
ArchNest - Single-user self-hosted ops dashboard on AWS.
Deploys a t4g.small EC2 instance with Docker Compose.
Parameters:
KeyPairName:
Type: String
Default: kiro-ide-key
Description: SSH key pair name for EC2 access
InstanceType:
Type: String
Default: t4g.small
AllowedValues:
- t4g.micro
- t4g.small
- t4g.medium
Description: EC2 instance type (ARM/Graviton)
VolumeSize:
Type: Number
Default: 30
Description: EBS volume size in GB
Resources:
# Security Group — allows SSH, HTTP, HTTPS, and the backend port
ArchNestSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: ArchNest security group
GroupName: archnest-sg
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
Description: SSH access
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
Description: HTTP (redirect to HTTPS)
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Description: HTTPS
- IpProtocol: tcp
FromPort: 8080
ToPort: 8080
CidrIp: 0.0.0.0/0
Description: Frontend (direct, before proxy)
- IpProtocol: tcp
FromPort: 4000
ToPort: 4000
CidrIp: 0.0.0.0/0
Description: Backend API
SecurityGroupEgress:
- IpProtocol: -1
CidrIp: 0.0.0.0/0
Description: All outbound (SSH to managed hosts, Docker pulls, etc.)
Tags:
- Key: Name
Value: archnest-sg
# Elastic IP — stable public IP across stop/start
ArchNestEIP:
Type: AWS::EC2::EIP
Properties:
Domain: vpc
Tags:
- Key: Name
Value: archnest-eip
# EC2 Instance
ArchNestInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: !Ref InstanceType
KeyName: !Ref KeyPairName
ImageId: !FindInMap [RegionAMI, !Ref 'AWS::Region', AMI]
SecurityGroupIds:
- !Ref ArchNestSecurityGroup
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: !Ref VolumeSize
VolumeType: gp3
Encrypted: true
UserData:
Fn::Base64: |
#!/bin/bash
set -e
# Update system
apt-get update -y
apt-get upgrade -y
# Install Docker
apt-get install -y docker.io docker-compose-v2 git curl
systemctl enable --now docker
# Create deploy directory
mkdir -p /opt/archnest
chown ubuntu:ubuntu /opt/archnest
# Signal ready
echo "ArchNest instance ready" > /opt/archnest/READY
Tags:
- Key: Name
Value: archnest
# Associate Elastic IP with instance
ArchNestEIPAssociation:
Type: AWS::EC2::EIPAssociation
Properties:
InstanceId: !Ref ArchNestInstance
EIP: !Ref ArchNestEIP
# Budget alarm — $30/month ceiling
ArchNestBudget:
Type: AWS::Budgets::Budget
Properties:
Budget:
BudgetName: archnest-monthly
BudgetType: COST
TimeUnit: MONTHLY
BudgetLimit:
Amount: 30
Unit: USD
NotificationsWithSubscribers:
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 80
Subscribers:
- SubscriptionType: EMAIL
Address: samueljamesinc@gmail.com
- Notification:
NotificationType: ACTUAL
ComparisonOperator: GREATER_THAN
Threshold: 100
Subscribers:
- SubscriptionType: EMAIL
Address: samueljamesinc@gmail.com
Mappings:
# Ubuntu 24.04 LTS ARM64 AMIs per region
RegionAMI:
us-east-1:
AMI: ami-0a7a4e87939439934
us-east-2:
AMI: ami-0ea3405d2d2522162
us-west-2:
AMI: ami-05d38da78ce859165
Outputs:
PublicIP:
Description: ArchNest public IP address
Value: !Ref ArchNestEIP
SSHCommand:
Description: SSH into the instance
Value: !Sub 'ssh -i ~/.ssh/kiro_ide_key ubuntu@${ArchNestEIP}'
InstanceId:
Description: EC2 Instance ID
Value: !Ref ArchNestInstance
EstimatedMonthlyCost:
Description: Estimated monthly cost
Value: '~$15/month (t4g.small + 30GB gp3 + Elastic IP)'

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

View file

Before

Width:  |  Height:  |  Size: 1.2 MiB

After

Width:  |  Height:  |  Size: 1.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View file

@ -1343,7 +1343,7 @@ function AboutSection() {
const rows: [string, string][] = [ const rows: [string, string][] = [
['App', 'ArchNest Dashboard v1.0.0'], ['App', 'ArchNest Dashboard v1.0.0'],
['Author', 'Samuel James'], ['Author', 'Samuel James'],
['Repo', 'github.com/SamuelSJames/archnest'], ['Repo', 'forgejo.snsnetlabs.com/sam/dev_arc_aws'],
['Stack', 'React 19, Vite, TypeScript'], ['Stack', 'React 19, Vite, TypeScript'],
['License', 'MIT'], ['License', 'MIT'],
] ]