dev_arc_aws/README.md
Samuel James 07116a0475
Docker setup-script hint + expanded Help page (#35)
* Add mesh prerequisite gate (NetBird verification before app config)

Implements the design in docs/mesh-prerequisite-gate.md per the user's
DECIDE A-D answers: a permanent admin override, B1 (reachable) verification
with host mesh IP shown informationally, members allowed in with a notice
instead of being blocked, and mesh.required defaulting off so the live
production instance is unaffected.

- system_config kv table + getConfig/setConfig helpers
- /api/system/mesh-status, /mesh/verify, /mesh/override, /mesh/required
- AuthContext gains a 'needs-mesh' status (admins only) and exposes
  meshStatus for a member-facing banner
- MeshGate page reuses the integration create+test flow to connect NetBird

* Make mesh verification universal (CIDR check, not NetBird-specific)

Replace the NetBird-adapter-based "reachable" check with a vendor-agnostic
one: the admin supplies the mesh's IP range (CIDR), and verification just
confirms this host has an address inside it. Works identically for
NetBird, WireGuard, ZeroTier, Tailscale, or any other mesh tech, with no
integration record or vendor API call required.

* Add reachability fallback for routed meshes (VPC peering, etc.)

A host can be on the mesh's "side" of a routed network (e.g. a VPC peered
into a NetBird/WireGuard mesh) without holding a local IP in the mesh's
own CIDR. Local-IP-in-CIDR stays the primary check; if it fails, the admin
can supply a known peer/gateway IP on the mesh and we verify by pinging
it instead. Adds iputils to the backend image for the ping binary.

* Add Mesh section to Settings for configuring/testing the mesh gate

Admins can now toggle mesh.required, run verify/override, and see
current mesh status entirely from the app, without hitting the API
directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019hu9pZvJY4BgmcQeAw2ugk

* Show a host-specific Docker remote-API setup script in Settings

When adding/editing a Docker integration with a tcp:// or http:// remote
URL, display a copyable systemd override + curl verification script
scoped to the entered host:port, so enabling the daemon's API doesn't
require looking up the steps separately.

* Expand Help page with quick-start guide and real-world examples

Adds a quick-start ordering card and per-feature example callouts (with icons) so first-time users see concrete use cases, not just descriptions.

* Update HANDOFF/README for handoff: mesh gate shipped, Docker UX work, no feature queued

Corrects the stale 'mesh gate not built' framing (it shipped across 4 commits, all merged) and documents the Docker setup-script hint + Help page expansion done this session. Leaves a clear next-task list for the picking-up agent: decide on merging claude/youthful-cerf-ibvxfb, then check with the user for the next priority.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-21 04:34:59 -04:00

256 lines
14 KiB
Markdown

# ArchNest
A self-hosted ops dashboard for a homelab/cloud setup: live infrastructure
monitoring across 9 real integration types, a categorized bookmark hub, a
full SSH suite (terminal, tunnels, file manager, host-to-host transfer, live
host metrics), Docker container management, and RDP/VNC/Telnet remote desktop
— all in one app, with zero mock data anywhere.
**This repo is private and will never be public.** This README is written for
the owner and for any AI session picking up the project cold — it should be
detailed enough that neither needs to re-derive context from scratch.
## What this is, in one paragraph
ArchNest replaced a Homarr-style bookmark dashboard plus a handful of
disconnected admin tools (Proxmox UI, Portainer, separate SSH terminals,
WinSCP-equivalents) with one app that talks directly to the underlying
systems. It started as a 6-page mockup/portfolio piece and has since grown
into an 11-page real tool with a real Fastify backend, real SSH/Docker/cloud
integrations, and no synthetic data — every number on every page comes from
a live API call, a SQLite-backed table, or an SSH command run against a
managed host.
## Current state & direction
**Live and deployed** at `archnest.snsnetlabs.com`, auto-deploying on every
merge to `main` via `.github/workflows/deploy.yml`. All 11 pages and their
backend routes are built and working — there is no pending/on-hold page.
Auth is feature-complete for self-hosted (Phases 1-3: user menu wiring,
password/sessions/login-log, multi-user roles with a 10-seat cap); Phase 4
(Authentik SSO) is **deferred to a paid AWS add-on** — see `ROADMAP.md`.
Recently shipped: persistent terminal sessions across navigation, Docker
container visibility/management three ways (Engine TCP API, `docker` CLI over
SSH, and a read-only push agent — see `docs/docker-agent-monitoring.md`), and
the **Mesh Prerequisite Gate** — a universal CIDR-based mesh-verification
requirement (with a routed-mesh/VPC-peering fallback, not NetBird-specific),
configurable from Settings → Mesh and defaulting OFF so it can't lock the live
instance.
There is no feature currently in progress. See `HANDOFF.md` for the latest
status and next steps.
If you're a fresh AI session: read this file, then `HANDOFF.md` (current
task state + standing workflow rules), then `design-decisions.md` (visual
conventions + accurate per-page implementation notes), then `ROADMAP.md`
(deferred/tiered work) and the `docs/` design docs (`docker-agent-monitoring.md`,
`mesh-prerequisite-gate.md`), then `TERMIX_MIGRATION.md`
(history of how the SSH/Docker/Guacamole feature set was built) if you need
that context.
## Pages
| Page | Route | What it does |
|------|-------|---------------|
| Glance | `/` | Home dashboard — system/integration health, resource overview, recent activity, shortcuts |
| Infrastructure | `/infrastructure` | Resource inventory across all integrations — distribution donut, per-resource status grid, integration health, activity |
| BookNest | `/booknest` | Categorized bookmark hub — quick access, favorites, link health, full CRUD |
| Terminal | `/terminal` | Web SSH terminal — multi-tab, split panes, tmux attach, cert auth (OPKSSH); **sessions stay connected across page navigation** |
| Tunnels | `/tunnels` | SSH tunnel manager — local/remote/dynamic (SOCKS5) forwarding, auto-start, live status |
| Files | `/files` | SFTP file browser/editor over managed SSH hosts, with host-to-host transfer |
| Containers | `/containers` | Docker containers across **three sources** (Engine TCP API, `docker` CLI over SSH, or a read-only push agent) — list/start/stop/restart/pause/remove, logs, interactive exec; tabbed with a clickable per-container detail view |
| Remote Desktop | `/remote-desktop` | RDP/VNC/Telnet sessions via a Guacamole sidecar |
| Host Metrics | `/host-metrics` | Live CPU/memory/disk/network/processes/ports/firewall/login-activity per SSH host, polled every 5s |
| Settings | `/settings` | Profile, Appearance, Security, Integrations, Notifications, Data & Backup, About — deep-linkable via `?tab=` |
| Help | `/help` | Static guided tour of every page above |
| Login / Enrollment | `/login`, `/enrollment` | Auth entry points — not in the sidebar nav |
See `design-decisions.md`'s "Page Notes" section for a detailed, per-page
breakdown of layout, real data sources, and known quirks — it's kept in sync
with the actual code, not a spec written before the page existed.
## Architecture
### Frontend (`/src`)
- React 19 + Vite + TypeScript, Tailwind CSS v4, Recharts (donuts/area
charts), Lucide React icons, React Router.
- `src/lib/api.ts` — typed fetch wrapper (`apiFetch`) + one function per
backend endpoint + matching TS interfaces. This is the contract between
frontend and backend; any new backend route needs a matching entry here.
- `src/lib/AuthContext.tsx` — auth state backed by `localStorage` (JWT
carrying a server-tracked session id; signing out revokes the session
server-side).
- `src/lib/TerminalSessionContext.tsx` — keeps SSH terminal sessions
(xterm + WebSocket + DOM node) alive above the router so they survive
in-app navigation; shared constants in `src/lib/terminalPrefs.ts`.
- `src/pages/` — one file per route (see table above), plus `Login.tsx` /
`Enrollment.tsx` for the unauthenticated/first-run flows.
- `src/components/``TopBar.tsx` (title, global search across pages/
integrations/bookmarks, user dropdown), `Sidebar.tsx` (nav + system-health
rollup widget).
- `App.tsx` — route table, plus per-route hero-banner config (`showHero`,
`heroPaddingTop`, `heroObjectPosition` lookup maps) and `topBarHeight`
lookup for pages with a subtitle (currently only BookNest).
### Backend (`/backend`)
- Fastify 5, TypeScript, ESM (`tsx` for dev, `tsc -b` for build), entrypoint
`src/server.ts`.
- `backend/src/db/index.ts` — SQLite schema + `logEvent()` audit log,
plus `sessions`/`login_events` tables and a multi-user `users` schema
(`role` admin/member + `active` columns).
- `backend/src/db/crypto.ts` — AES-256-GCM `encryptSecret`/`decryptSecret`,
keyed by `ARCHNEST_SECRET_KEY`.
- `backend/src/routes/` — one file per feature area:
- `auth.ts` — setup, login, profile, password change, sessions,
login audit log, and admin-only user management (`/api/setup`,
`/api/auth/*`, `/api/users`)
- `integrations.ts` — integration CRUD + connection testing
- `bookmarks.ts` — bookmarks + categories CRUD
- `events.ts` — activity log retrieval
- `terminal.ts` — SSH terminal WebSocket (`connect`/`input`/`resize`/
`list_tmux`/`disconnect`)
- `tunnels.ts` — SSH tunnel CRUD + connect/disconnect
- `files.ts` — SFTP list/read/write/mkdir/rename/delete/chmod/download/upload
- `docker.ts` — Docker Engine TCP API: container list/stats/logs/actions + exec WebSocket
- `dockerSsh.ts` — Docker over SSH: runs the `docker` CLI on a remote SSH host (list/logs/actions + exec WebSocket); no dockerd socket exposed
- `agents.ts` — Docker monitoring agents: token-gated push ingest (`POST /api/agents/docker/report`) + read-only host/container views
- `guacamole.ts` — Guacamole WebSocket proxy for remote desktop
- `metrics.ts` — live host metrics endpoint
- `transfer.ts` — host-to-host file transfer orchestration (start/poll/cancel)
- `data.ts` — full backup export/import (integrations + secrets + bookmarks + tunnels)
- `backend/src/integrations/` — one adapter per type, all real (none are
stubs): `proxmox.ts`, `docker.ts`, `netbird.ts`, `cloudflare.ts`, `aws.ts`,
`uptimeKuma.ts`, `weather.ts`, `ssh.ts`, `remoteDesktop.ts`. Each implements
`testConnection()` (required) and `listResources()` (optional);
`registry.ts` maps `IntegrationType` → adapter.
- `backend/src/ssh/` — the shared SSH transport layer used by Terminal,
Files, Tunnels, Transfers, and Host Metrics:
- `connect.ts` — jump-host chaining, host-key verification, certificate auth
- `sftp.ts` — ephemeral SFTP connections for file ops
- `transfer.ts` — streamed host-to-host copy/move with progress + cancel
- `docker.ts` — runs the `docker` CLI over SSH for the Containers page's
"Docker over SSH" source (list/logs/actions + interactive exec)
- `metrics/` — 10 sequential collectors (cpu, memory, disk, uptime,
network, system, processes, ports, firewall, login-stats) — sequential
on purpose, to stay under OpenSSH's `MaxSessions` limit per host.
- Docker images run on Alpine; **OpenSSL legacy provider is enabled** in
`backend/Dockerfile` (`OPENSSL_CONF=/etc/ssl/openssl-legacy.cnf`) so
old-format encrypted PEM keys (`BEGIN RSA PRIVATE KEY` + `DEK-Info`) still
decrypt under OpenSSL 3 — don't remove this without understanding why.
- **Required env vars, no defaults**: `ARCHNEST_SECRET_KEY`,
`ARCHNEST_JWT_SECRET`. The server refuses to start without both. Optional:
`ARCHNEST_DB_PATH`, `PORT`, `ARCHNEST_GUAC_CRYPT_KEY` /
`ARCHNEST_GUACD_HOST` / `ARCHNEST_GUACD_PORT`, `ARCHNEST_CORS_ORIGIN`,
`ARCHNEST_SESSION_LOG_DIR` (optional terminal session logging),
`ARCHNEST_AGENT_TOKEN` (shared token enabling the Docker monitoring-agent
ingest endpoint — ingest is disabled / returns 503 when unset),
`ARCHNEST_AGENT_STALE_MS` (default 90000; when an agent report is shown stale).
- `backend/src/docker/` — Docker Engine TCP API client used by `docker.ts`.
- `agent/` — the standalone Docker monitoring agent (`archnest-docker-agent.sh`
+ install/README). Runs on each Docker VM and pushes reports to ArchNest.
## Development
Frontend:
```bash
npm install
npm run dev
```
Backend:
```bash
cd backend
npm install
ARCHNEST_SECRET_KEY=$(openssl rand -hex 32) ARCHNEST_JWT_SECRET=$(openssl rand -hex 32) npm run dev
```
`ARCHNEST_DB_PATH` optionally overrides the SQLite file location (defaults to
a local path under `backend/`). `PORT` overrides the listen port (check
`server.ts` for the default).
Type-check both before committing — this is the minimum bar, not a substitute
for testing in a browser:
```bash
npx tsc --noEmit # from repo root, frontend
cd backend && npx tsc --noEmit # backend
```
Vite/the browser surface some runtime errors (e.g. missing icon exports —
see the lucide-react gotcha in `design-decisions.md`) that the type-checker
won't catch.
## Tech Stack
**Frontend**
- React 19 + Vite + TypeScript, React Router, Tailwind CSS v4
- Recharts (donuts, line/area charts), Lucide React (icons)
- xterm.js (Terminal page terminal rendering)
**Backend**
- Fastify 5 + TypeScript, `tsx` for dev, `tsc -b` for build
- `better-sqlite3` for storage
- `@fastify/jwt` for auth tokens, `bcryptjs` for password hashing
- `zod` for request validation
- AES-256-GCM (Node `crypto`) for encrypting integration secrets at rest
- SSH client library powering the SSH transport layer (`backend/src/ssh/`)
- Guacamole Lite protocol for RDP/VNC/Telnet, proxied to a `guacd` sidecar
**Integrations**: Proxmox, Docker, NetBird, Cloudflare, AWS, Uptime Kuma,
Weather (wttr.in), SSH, Remote Desktop (RDP/VNC/Telnet via Guacamole) — see
`backend/src/integrations/` for adapter implementations.
**Deploy target:** Docker on `racknerd1` → Nginx Proxy Manager at
`archnest.snsnetlabs.com`.
## Deployment
**Live and deployed.** `.github/workflows/deploy.yml` triggers on every push
to `main`: builds, SCPs the repo to `racknerd1`, and runs
`docker compose up -d --build` there, gated on an `/api/health` health check.
No further setup is needed — merging a PR to `main` redeploys automatically.
`docker-compose.yml` runs 3 services: `archnest` (frontend), `archnest-backend`,
and `guacd` (remote desktop sidecar).
If a deploy fails, check the workflow run's `deploy` job steps in order:
`Pre-flight` (confirms host `.env` exists) → `Copy repo to racknerd1`
`Build, restart, and clean up``Health check (backend /api/health)`.
One-time setup already done (reference only, shouldn't need repeating): host
provisioning (Docker/Compose on `racknerd1`, deploy SSH user, `/opt/archnest`
directory), `/opt/archnest/.env` populated from `.env.example` with real
secrets, `RACKNERD_HOST`/`RACKNERD_USER`/`RACKNERD_SSH_KEY` added as GitHub
Actions secrets, DNS/Nginx Proxy Manager pointed at the host.
## Documentation map
- **`README.md`** (this file) — architecture, tech stack, deployment, page list.
- **`HANDOFF.md`** — current task state, standing workflow rules (git workflow,
mock-data policy, secrets discipline), and the auth/SSO roadmap. Read this
before starting any new work session.
- **`design-decisions.md`** — visual/UX conventions (colors, typography, card
style, animations) plus a detailed, accurate-as-of-now "Page Notes" section
per page — what's actually rendered and where its data comes from. This is
the file to update whenever a page's layout or data source changes.
- **`TERMIX_MIGRATION.md`** — phase-by-phase history of how the SSH/Tunnels/
Files/Containers/Remote Desktop/Host Metrics/Transfer/Data-export feature
set was built (originally scoped as a migration from a forked Termix
project, hence the name). Useful for historical "why was it built this
way" context on those specific features.
- **`.kiro/steering/design-rules.md`** — a condensed duplicate of
`design-decisions.md`'s Global Rules, auto-injected into every Kiro IDE
session (the Kiro extension reads `.kiro/steering/*` automatically). If you
update a global design rule, update both files in the same change —
`design-decisions.md` is canonical, this one just needs to stay in sync so
Kiro doesn't steer on stale info.
Three older docs were deleted as part of a documentation cleanup:
`archnest-blueprint.md` and `glance.md` (the original 6-page mockup pitch and
an early Glance-only spec, both describing fictional config files and
placeholder numbers that never matched the real build), and
`.kiro/specs/archnest-dashboard/` (an abandoned Kiro spec — requirements-only,
no `design.md`/`tasks.md` ever followed — describing the same stale 6-page/
80px-sidebar/Zustand-based vision). Their still-accurate content (color
palette, dropdown menu shape, card styling) was folded into
`design-decisions.md` and `.kiro/steering/design-rules.md`; everything else
was superseded by the real, deployed implementation described above.