dev_arc_aws/README.md
Claude 5703f93027
Update HANDOFF/README for handoff: mesh gate shipped, Docker UX work, no feature queued
Corrects the stale 'mesh gate not built' framing (it shipped across 4 commits, all merged) and documents the Docker setup-script hint + Help page expansion done this session. Leaves a clear next-task list for the picking-up agent: decide on merging claude/youthful-cerf-ibvxfb, then check with the user for the next priority.
2026-06-21 08:30:00 +00:00

256 lines
14 KiB
Markdown

# ArchNest
A self-hosted ops dashboard for a homelab/cloud setup: live infrastructure
monitoring across 9 real integration types, a categorized bookmark hub, a
full SSH suite (terminal, tunnels, file manager, host-to-host transfer, live
host metrics), Docker container management, and RDP/VNC/Telnet remote desktop
— all in one app, with zero mock data anywhere.
**This repo is private and will never be public.** This README is written for
the owner and for any AI session picking up the project cold — it should be
detailed enough that neither needs to re-derive context from scratch.
## What this is, in one paragraph
ArchNest replaced a Homarr-style bookmark dashboard plus a handful of
disconnected admin tools (Proxmox UI, Portainer, separate SSH terminals,
WinSCP-equivalents) with one app that talks directly to the underlying
systems. It started as a 6-page mockup/portfolio piece and has since grown
into an 11-page real tool with a real Fastify backend, real SSH/Docker/cloud
integrations, and no synthetic data — every number on every page comes from
a live API call, a SQLite-backed table, or an SSH command run against a
managed host.
## Current state & direction
**Live and deployed** at `archnest.snsnetlabs.com`, auto-deploying on every
merge to `main` via `.github/workflows/deploy.yml`. All 11 pages and their
backend routes are built and working — there is no pending/on-hold page.
Auth is feature-complete for self-hosted (Phases 1-3: user menu wiring,
password/sessions/login-log, multi-user roles with a 10-seat cap); Phase 4
(Authentik SSO) is **deferred to a paid AWS add-on** — see `ROADMAP.md`.
Recently shipped: persistent terminal sessions across navigation, Docker
container visibility/management three ways (Engine TCP API, `docker` CLI over
SSH, and a read-only push agent — see `docs/docker-agent-monitoring.md`), and
the **Mesh Prerequisite Gate** — a universal CIDR-based mesh-verification
requirement (with a routed-mesh/VPC-peering fallback, not NetBird-specific),
configurable from Settings → Mesh and defaulting OFF so it can't lock the live
instance.
There is no feature currently in progress. See `HANDOFF.md` for the latest
status and next steps.
If you're a fresh AI session: read this file, then `HANDOFF.md` (current
task state + standing workflow rules), then `design-decisions.md` (visual
conventions + accurate per-page implementation notes), then `ROADMAP.md`
(deferred/tiered work) and the `docs/` design docs (`docker-agent-monitoring.md`,
`mesh-prerequisite-gate.md`), then `TERMIX_MIGRATION.md`
(history of how the SSH/Docker/Guacamole feature set was built) if you need
that context.
## Pages
| Page | Route | What it does |
|------|-------|---------------|
| Glance | `/` | Home dashboard — system/integration health, resource overview, recent activity, shortcuts |
| Infrastructure | `/infrastructure` | Resource inventory across all integrations — distribution donut, per-resource status grid, integration health, activity |
| BookNest | `/booknest` | Categorized bookmark hub — quick access, favorites, link health, full CRUD |
| Terminal | `/terminal` | Web SSH terminal — multi-tab, split panes, tmux attach, cert auth (OPKSSH); **sessions stay connected across page navigation** |
| Tunnels | `/tunnels` | SSH tunnel manager — local/remote/dynamic (SOCKS5) forwarding, auto-start, live status |
| Files | `/files` | SFTP file browser/editor over managed SSH hosts, with host-to-host transfer |
| Containers | `/containers` | Docker containers across **three sources** (Engine TCP API, `docker` CLI over SSH, or a read-only push agent) — list/start/stop/restart/pause/remove, logs, interactive exec; tabbed with a clickable per-container detail view |
| Remote Desktop | `/remote-desktop` | RDP/VNC/Telnet sessions via a Guacamole sidecar |
| Host Metrics | `/host-metrics` | Live CPU/memory/disk/network/processes/ports/firewall/login-activity per SSH host, polled every 5s |
| Settings | `/settings` | Profile, Appearance, Security, Integrations, Notifications, Data & Backup, About — deep-linkable via `?tab=` |
| Help | `/help` | Static guided tour of every page above |
| Login / Enrollment | `/login`, `/enrollment` | Auth entry points — not in the sidebar nav |
See `design-decisions.md`'s "Page Notes" section for a detailed, per-page
breakdown of layout, real data sources, and known quirks — it's kept in sync
with the actual code, not a spec written before the page existed.
## Architecture
### Frontend (`/src`)
- React 19 + Vite + TypeScript, Tailwind CSS v4, Recharts (donuts/area
charts), Lucide React icons, React Router.
- `src/lib/api.ts` — typed fetch wrapper (`apiFetch`) + one function per
backend endpoint + matching TS interfaces. This is the contract between
frontend and backend; any new backend route needs a matching entry here.
- `src/lib/AuthContext.tsx` — auth state backed by `localStorage` (JWT
carrying a server-tracked session id; signing out revokes the session
server-side).
- `src/lib/TerminalSessionContext.tsx` — keeps SSH terminal sessions
(xterm + WebSocket + DOM node) alive above the router so they survive
in-app navigation; shared constants in `src/lib/terminalPrefs.ts`.
- `src/pages/` — one file per route (see table above), plus `Login.tsx` /
`Enrollment.tsx` for the unauthenticated/first-run flows.
- `src/components/``TopBar.tsx` (title, global search across pages/
integrations/bookmarks, user dropdown), `Sidebar.tsx` (nav + system-health
rollup widget).
- `App.tsx` — route table, plus per-route hero-banner config (`showHero`,
`heroPaddingTop`, `heroObjectPosition` lookup maps) and `topBarHeight`
lookup for pages with a subtitle (currently only BookNest).
### Backend (`/backend`)
- Fastify 5, TypeScript, ESM (`tsx` for dev, `tsc -b` for build), entrypoint
`src/server.ts`.
- `backend/src/db/index.ts` — SQLite schema + `logEvent()` audit log,
plus `sessions`/`login_events` tables and a multi-user `users` schema
(`role` admin/member + `active` columns).
- `backend/src/db/crypto.ts` — AES-256-GCM `encryptSecret`/`decryptSecret`,
keyed by `ARCHNEST_SECRET_KEY`.
- `backend/src/routes/` — one file per feature area:
- `auth.ts` — setup, login, profile, password change, sessions,
login audit log, and admin-only user management (`/api/setup`,
`/api/auth/*`, `/api/users`)
- `integrations.ts` — integration CRUD + connection testing
- `bookmarks.ts` — bookmarks + categories CRUD
- `events.ts` — activity log retrieval
- `terminal.ts` — SSH terminal WebSocket (`connect`/`input`/`resize`/
`list_tmux`/`disconnect`)
- `tunnels.ts` — SSH tunnel CRUD + connect/disconnect
- `files.ts` — SFTP list/read/write/mkdir/rename/delete/chmod/download/upload
- `docker.ts` — Docker Engine TCP API: container list/stats/logs/actions + exec WebSocket
- `dockerSsh.ts` — Docker over SSH: runs the `docker` CLI on a remote SSH host (list/logs/actions + exec WebSocket); no dockerd socket exposed
- `agents.ts` — Docker monitoring agents: token-gated push ingest (`POST /api/agents/docker/report`) + read-only host/container views
- `guacamole.ts` — Guacamole WebSocket proxy for remote desktop
- `metrics.ts` — live host metrics endpoint
- `transfer.ts` — host-to-host file transfer orchestration (start/poll/cancel)
- `data.ts` — full backup export/import (integrations + secrets + bookmarks + tunnels)
- `backend/src/integrations/` — one adapter per type, all real (none are
stubs): `proxmox.ts`, `docker.ts`, `netbird.ts`, `cloudflare.ts`, `aws.ts`,
`uptimeKuma.ts`, `weather.ts`, `ssh.ts`, `remoteDesktop.ts`. Each implements
`testConnection()` (required) and `listResources()` (optional);
`registry.ts` maps `IntegrationType` → adapter.
- `backend/src/ssh/` — the shared SSH transport layer used by Terminal,
Files, Tunnels, Transfers, and Host Metrics:
- `connect.ts` — jump-host chaining, host-key verification, certificate auth
- `sftp.ts` — ephemeral SFTP connections for file ops
- `transfer.ts` — streamed host-to-host copy/move with progress + cancel
- `docker.ts` — runs the `docker` CLI over SSH for the Containers page's
"Docker over SSH" source (list/logs/actions + interactive exec)
- `metrics/` — 10 sequential collectors (cpu, memory, disk, uptime,
network, system, processes, ports, firewall, login-stats) — sequential
on purpose, to stay under OpenSSH's `MaxSessions` limit per host.
- Docker images run on Alpine; **OpenSSL legacy provider is enabled** in
`backend/Dockerfile` (`OPENSSL_CONF=/etc/ssl/openssl-legacy.cnf`) so
old-format encrypted PEM keys (`BEGIN RSA PRIVATE KEY` + `DEK-Info`) still
decrypt under OpenSSL 3 — don't remove this without understanding why.
- **Required env vars, no defaults**: `ARCHNEST_SECRET_KEY`,
`ARCHNEST_JWT_SECRET`. The server refuses to start without both. Optional:
`ARCHNEST_DB_PATH`, `PORT`, `ARCHNEST_GUAC_CRYPT_KEY` /
`ARCHNEST_GUACD_HOST` / `ARCHNEST_GUACD_PORT`, `ARCHNEST_CORS_ORIGIN`,
`ARCHNEST_SESSION_LOG_DIR` (optional terminal session logging),
`ARCHNEST_AGENT_TOKEN` (shared token enabling the Docker monitoring-agent
ingest endpoint — ingest is disabled / returns 503 when unset),
`ARCHNEST_AGENT_STALE_MS` (default 90000; when an agent report is shown stale).
- `backend/src/docker/` — Docker Engine TCP API client used by `docker.ts`.
- `agent/` — the standalone Docker monitoring agent (`archnest-docker-agent.sh`
+ install/README). Runs on each Docker VM and pushes reports to ArchNest.
## Development
Frontend:
```bash
npm install
npm run dev
```
Backend:
```bash
cd backend
npm install
ARCHNEST_SECRET_KEY=$(openssl rand -hex 32) ARCHNEST_JWT_SECRET=$(openssl rand -hex 32) npm run dev
```
`ARCHNEST_DB_PATH` optionally overrides the SQLite file location (defaults to
a local path under `backend/`). `PORT` overrides the listen port (check
`server.ts` for the default).
Type-check both before committing — this is the minimum bar, not a substitute
for testing in a browser:
```bash
npx tsc --noEmit # from repo root, frontend
cd backend && npx tsc --noEmit # backend
```
Vite/the browser surface some runtime errors (e.g. missing icon exports —
see the lucide-react gotcha in `design-decisions.md`) that the type-checker
won't catch.
## Tech Stack
**Frontend**
- React 19 + Vite + TypeScript, React Router, Tailwind CSS v4
- Recharts (donuts, line/area charts), Lucide React (icons)
- xterm.js (Terminal page terminal rendering)
**Backend**
- Fastify 5 + TypeScript, `tsx` for dev, `tsc -b` for build
- `better-sqlite3` for storage
- `@fastify/jwt` for auth tokens, `bcryptjs` for password hashing
- `zod` for request validation
- AES-256-GCM (Node `crypto`) for encrypting integration secrets at rest
- SSH client library powering the SSH transport layer (`backend/src/ssh/`)
- Guacamole Lite protocol for RDP/VNC/Telnet, proxied to a `guacd` sidecar
**Integrations**: Proxmox, Docker, NetBird, Cloudflare, AWS, Uptime Kuma,
Weather (wttr.in), SSH, Remote Desktop (RDP/VNC/Telnet via Guacamole) — see
`backend/src/integrations/` for adapter implementations.
**Deploy target:** Docker on `racknerd1` → Nginx Proxy Manager at
`archnest.snsnetlabs.com`.
## Deployment
**Live and deployed.** `.github/workflows/deploy.yml` triggers on every push
to `main`: builds, SCPs the repo to `racknerd1`, and runs
`docker compose up -d --build` there, gated on an `/api/health` health check.
No further setup is needed — merging a PR to `main` redeploys automatically.
`docker-compose.yml` runs 3 services: `archnest` (frontend), `archnest-backend`,
and `guacd` (remote desktop sidecar).
If a deploy fails, check the workflow run's `deploy` job steps in order:
`Pre-flight` (confirms host `.env` exists) → `Copy repo to racknerd1`
`Build, restart, and clean up``Health check (backend /api/health)`.
One-time setup already done (reference only, shouldn't need repeating): host
provisioning (Docker/Compose on `racknerd1`, deploy SSH user, `/opt/archnest`
directory), `/opt/archnest/.env` populated from `.env.example` with real
secrets, `RACKNERD_HOST`/`RACKNERD_USER`/`RACKNERD_SSH_KEY` added as GitHub
Actions secrets, DNS/Nginx Proxy Manager pointed at the host.
## Documentation map
- **`README.md`** (this file) — architecture, tech stack, deployment, page list.
- **`HANDOFF.md`** — current task state, standing workflow rules (git workflow,
mock-data policy, secrets discipline), and the auth/SSO roadmap. Read this
before starting any new work session.
- **`design-decisions.md`** — visual/UX conventions (colors, typography, card
style, animations) plus a detailed, accurate-as-of-now "Page Notes" section
per page — what's actually rendered and where its data comes from. This is
the file to update whenever a page's layout or data source changes.
- **`TERMIX_MIGRATION.md`** — phase-by-phase history of how the SSH/Tunnels/
Files/Containers/Remote Desktop/Host Metrics/Transfer/Data-export feature
set was built (originally scoped as a migration from a forked Termix
project, hence the name). Useful for historical "why was it built this
way" context on those specific features.
- **`.kiro/steering/design-rules.md`** — a condensed duplicate of
`design-decisions.md`'s Global Rules, auto-injected into every Kiro IDE
session (the Kiro extension reads `.kiro/steering/*` automatically). If you
update a global design rule, update both files in the same change —
`design-decisions.md` is canonical, this one just needs to stay in sync so
Kiro doesn't steer on stale info.
Three older docs were deleted as part of a documentation cleanup:
`archnest-blueprint.md` and `glance.md` (the original 6-page mockup pitch and
an early Glance-only spec, both describing fictional config files and
placeholder numbers that never matched the real build), and
`.kiro/specs/archnest-dashboard/` (an abandoned Kiro spec — requirements-only,
no `design.md`/`tasks.md` ever followed — describing the same stale 6-page/
80px-sidebar/Zustand-based vision). Their still-accurate content (color
palette, dropdown menu shape, card styling) was folded into
`design-decisions.md` and `.kiro/steering/design-rules.md`; everything else
was superseded by the real, deployed implementation described above.