sam/dev_arc_aws

Fork 0

Samuel James ad4687660c

Build & Push Images / build (push) Successful in 41s

Details

CI / validate (push) Successful in 51s

Details

Build & Push Images / deploy (push) Successful in 30s

Details

Document the Forgejo CI/CD + racknerd2 setup as the baseline

Make the automated pipeline the documented "setup moving forward" and
finish scrubbing the last stale GitHub-Actions/racknerd1 references that
never reached main.

- HANDOFF.md: refresh the stale 2026-06-21 snapshot. New "CI/CD & deploy"
  section (push to main -> build + push to registry.snsnetlabs.com ->
  auto-deploy to racknerd2 over SSH, SHA-pinned, /api/health gate),
  racknerd2 validation-host + SSH-tunnel access notes, Forgejo workflow
  rule, and a current Deployment + orientation section.
- .kiro/steering/project-guide.md: Forgejo-only Git workflow (no gh),
  CI/CD row, registry host, racknerd2 + forgejo-runner SSH entries, and a
  CI/CD pipeline section.
- .kiro/hooks/tunnel-racknerd2-8080.kiro.hook: the "View ArchNest on
  racknerd2" hook (ssh -L 8080:localhost:8080 -N) to view the deployed
  site at http://localhost:8080 (racknerd2's edge only allows port 22).
- src/pages/Settings.tsx: About panel repo URL -> Forgejo.
- .dockerignore: .github -> .forgejo.
- TERMIX_MIGRATION.md / docs/OPEN-SOURCE-RELEASE.md: drop stale
  .github/workflows + "GitHub Actions deploy" references.

Co-authored-by: Samuel James <ssamjame@amazon.com>
Co-authored-by: Kiro <noreply@kiro.dev>

2026-06-25 13:37:39 -04:00

42 KiB

Raw Permalink Blame History

Termix → ArchNest Migration Plan

Status doc for porting Termix's full feature set into ArchNest as a single app, single backend, single auth, single database — reskinned to match ArchNest's design. Written so any session (human or AI) can see exactly what's done, what's next, and why decisions were made.

Migration status: COMPLETE. All 8 phases below are DONE and verified. No further feature work is queued from this migration. CI/CD has since moved to Forgejo Actions (build → registry.snsnetlabs.com → auto-deploy to racknerd2) — see HANDOFF.md and deploy/README.md. Do not start new feature work here without explicit instruction.

Source: https://github.com/SamuelSJames/Termix (user's fork), cloned for reference at the time of writing. Upstream is Termix-SSH/Termix, an Electron + Express + Drizzle ORM self-hosted SSH/RDP/VNC management app — not a small terminal widget. It ships as its own Docker image with a guacd sidecar for RDP/VNC.

Decision: why merge into ArchNest's backend, not Termix's

ArchNest's backend (Fastify + better-sqlite3 + JWT) is small and already has things worth keeping: the bookmarks system, the integration adapter framework (Proxmox/AWS/NetBird/Cloudflare/Weather/SSH health checks — see backend/src/integrations/), the audit log, and a working auth/profile system built this session. Termix's backend is much bigger but its value is the SSH/tunnel/file-manager/Docker/RDP feature logic, not its auth system (OIDC/LDAP/2FA) or its Drizzle schema. So: port Termix's feature modules onto ArchNest's existing Fastify app and auth, don't adopt Termix's backend wholesale.

What is explicitly NOT being ported (user-approved tradeoff)

Electron desktop app + native installers (Chocolatey/Flatpak/AppImage/MSI/Cask) — ArchNest is a web app.
OIDC/LDAP/2FA/SSO and Termix's own multi-user auth system — replaced by ArchNest's existing JWT auth. User confirmed they don't currently use 2FA/OIDC/LDAP, so this is an accepted downgrade, not an oversight.
~30 language translations (i18n) — not a stated goal, not being ported.
All Termix branding — logos, icons, About/product copy, links to Termix's Discord/docs/GitHub. Every ported UI component gets reskinned to ArchNest's Tailwind theme (gold #C8A434, the existing dark palette) as part of porting it, not as a separate pass.

Everything else — SSH terminal, tunnels, file manager, Docker management, RDP/VNC/Telnet, host metrics — is in scope to fully port, feature-equivalent, just rebuilt on ArchNest's stack.

Phases

Each phase is independently committable and testable. Do not start a later phase before the previous one is working end-to-end and committed — this is a large port and needs to land in reviewable chunks.

Phase 1 — SSH Terminal (DONE)

The actual /terminal page: a real interactive SSH terminal in the browser (xterm.js + WebSocket), reusing the SSH credentials already stored in ArchNest's integrations (no second "add a host" flow — Termix's separate host-manager concept is being merged into ArchNest's existing integrations table/SSH adapter, not duplicated).

Termix source files this phase is based on (sizes as of the fork snapshot, for scoping):

src/backend/ssh/terminal.ts (2,570 lines) — WebSocket route handling, message protocol (connect/data/resize/disconnect), output buffering.
src/backend/ssh/terminal-session-manager.ts (570 lines) — session lifecycle, reattach-on-reconnect, per-user session caps, idle timeout, optional session logging to disk.
src/backend/ssh/ssh-connection-pool.ts (225 lines) — connection reuse.
src/backend/ssh/host-resolver.ts, jump-host-chain.ts, terminal-jump-hosts.ts (~900 lines combined) — jump-host / bastion chaining.
src/backend/ssh/auth-manager.ts, credential-username.ts, host-key-verifier.ts, terminal-auth-helpers.ts (~950 lines combined) — credential resolution, host key verification/trust-on-first-use.
src/backend/ssh/opkssh-auth.ts, opkssh-cert-auth.ts (~1,350 lines) — OPKSSH (OpenPubkey SSH) certificate auth.
src/backend/ssh/tmux-monitor.ts, tmux-helper.ts, tmux-monitor-helpers.ts (~1,350 lines) — tmux session detection/monitoring inside the terminal.
Frontend: src/ui/features/terminal/* — xterm.js wrapper, tab system, up-to-4-panel split screen, theme/font customization.

Scope split for this phase, given the size above:

Phase 1a (doing this now): core single-session SSH terminal. WebSocket connect/data/resize/disconnect, using ArchNest's existing SSH integration config/secrets (host/port/username/password/privateKey/passphrase — already in backend/src/integrations/ssh.ts) instead of Termix's separate host table. One terminal per tab, no split panes yet, no jump hosts, no OPKSSH, no tmux monitor, no session recording/logging. Ported onto Fastify's WebSocket support, reusing ArchNest's JWT auth for the WS handshake.
Phase 1b (follow-up, not blocking 1a): jump-host/bastion chaining, host-key verification/trust-on-first-use UI, tab system + up to 4 split panes, terminal theme/font customization settings.
Phase 1c (follow-up, lower priority): OPKSSH cert auth, tmux session monitor/reattach, session recording/logging to disk.

Rationale for splitting: 1a alone is a real, useful terminal (matches what /terminal needs to stop being a placeholder) and is testable end-to-end on its own. Bundling jump-hosts/OPKSSH/tmux into the first pass risks a large unreviewable change with no working checkpoint in between.

Status:

✅ Phase 1a — done. /terminal is a real interactive SSH terminal: backend/src/routes/terminal.ts (WebSocket, connect/input/resize/disconnect over ssh2), backend/src/db/secrets.ts (shared secret loader), src/pages/Terminal.tsx (xterm.js + host picker, reuses ArchNest's existing SSH integrations — no duplicate host table). Verified end-to-end against a real test SSH server. No jump hosts, no tabs/split panes, no OPKSSH, no tmux monitor yet — see 1b/1c below.
✅ Phase 1b — done.
- Jump-host chaining: an SSH integration's config can carry jumpHostIntegrationId referencing another SSH integration. backend/src/routes/terminal.ts connects to the jump host first, opens a forwardOut() channel to the real target, and connects the target Client over that channel (single-hop; mirrors Termix's core mechanism without its multi-hop/credential-sharing complexity). Verified end-to-end with two real test SSH servers (one as jump, one as target).
- Host-key verification (TOFU): new ssh_host_keys table (backend/src/db/index.ts) stores a SHA-256 fingerprint per SSH integration on first successful connect; subsequent connects are rejected if the fingerprint changes, via ssh2's hostVerifier connect option. No interactive accept/reject-changed-key UI yet — first-use accept-and-store, hard-reject on mismatch. Verified both the accept-on-first-use and reject-on-mismatch paths against a real test server.
- Settings UI for multiple SSH hosts: src/pages/Settings.tsx previously could only show/edit one integration per type, which silently broke multi-host SSH. Added a dedicated SshHostsSection with its own per-host cards (Save/Test/Delete) and an "Add SSH Host" flow, including a Jump Host dropdown populated from the other configured SSH hosts.
- Tabs + up to 4 split panes: src/pages/Terminal.tsx rewritten around a TerminalPane component (one xterm + WebSocket connection each, reusable). Each tab holds 1/2/4 panes (single / split-2 / 2x2 grid); each pane connects independently to whichever SSH host is clicked while it's focused.
- Terminal theme/font customization: a preferences bar (theme preset, font size, font family) persisted to localStorage (archnest-terminal-prefs), applied per-pane on connect.
- Verified via a clean production build (tsc -b && vite build), and subsequently browser-verified (Playwright/Chromium, once available): logged in, opened /terminal, connected a pane to a real SSH host (confirmed by the live remote prompt uitester@vm:~$ and a Connected — <host> status), split into 2 and 4 panes (confirmed 1→2→4 live xterm instances rendering as a 2×2 grid), opened a new tab, and changed the theme preference — confirmed it persisted to localStorage (archnest-terminal-prefs → {"themeName":"Matrix",...}). The original build-only caveat is now closed.
✅ Phase 1c — done, with one documented verification gap.
- OPKSSH / certificate auth: ssh2 (the npm library) has no support for OpenSSH certificates — confirmed by inspecting its type definitions and README, no certificate-related auth flow exists. Implemented connectWithCertificate() in backend/src/routes/terminal.ts: writes the stored private key + certificate to a temp dir (mode 0600) and shells out to the system ssh binary (which natively understands -o CertificateFile=) under a real node-pty pty. Used automatically when an SSH integration has a certificate secret configured (new field added to Settings' SSH host form). Does not support jump-host chaining (documented limitation, not silently dropped — Termix's own OPKSSH path doesn't generally chain through jump hosts either). Verified end-to-end (gap from the original pass now closed): with openssh-client/openssh-server available, built a real SSH CA, signed a user key into an OpenSSH certificate (principal certuser), configured a real sshd with TrustedUserCAKeys + PasswordAuthentication no (so only cert auth could succeed), created a real ssh-type integration carrying the private key + certificate as secrets, and drove ArchNest's actual /api/terminal WebSocket route: it reached connected, spawned the cert-auth pty, and a real shell echoed back a marker as certuser — i.e. authentication genuinely happened via the certificate, not a password or plain key.
- tmux session monitor/reattach: new WebSocket message list_tmux execs tmux list-sessions on the target host and returns session names; connect accepts an optional tmuxSession (validated against ^[A-Za-z0-9_-]{1,64}$ before being interpolated into a shell command, to prevent injection) which attaches to that tmux session or creates it if missing, via exec('tmux attach -t <name> || tmux new-session -s <name>', { pty: ... }) instead of a plain client.shell(). src/pages/Terminal.tsx's pane header gained a tmux session picker (plain shell / new session / attach to an existing one). Verified end-to-end against a real test SSH server running real bash/tmux processes (via node-pty): listed zero sessions, created a testsess tmux session through the WS protocol, confirmed a follow-up list_tmux call returned ['testsess'].
- Session recording/logging to disk: new SSH integration config field sessionLogging (checkbox in Settings' SSH host form). When set, all outbound terminal output (both the ssh2 path and the cert-auth pty path) is appended to <ARCHNEST_SESSION_LOG_DIR ?? './data/session-logs'>/<integrationId>_<timestamp>.log. No log browsing/download UI yet (not built — out of scope for this pass, not silently dropped). Verified end-to-end: a real shell session's output was confirmed present in its log file on disk.
- Everything in this phase was tested against live processes (real sshd, real tmux, real cert-auth via a real SSH CA), not mocked. The Phase 1b UI (tabs/split panes/theme) remains build/type-verified only — no interactive browser click-through was done — but every backend path, including cert auth, is now exercised end-to-end. All cert-auth test artifacts (CA, signed cert, test sshd, test OS user, test backend/DB) were cleaned up afterward.

Phase 2 — SSH Tunnels (DONE)

Source: src/backend/ssh/tunnel.ts (2,414 lines) + tunnel-c2s-relay.ts, tunnel-socks5-relay.ts, tunnel-ssh-primitives.ts, tunnel-utils.ts, tunnel-c2s-relay-utils.ts (~830 lines combined) + frontend src/ui/features/tunnel/*.

Scope decision: Termix distinguishes "S2S" (server-to-server, backend-managed) and "C2S" (client-to-server, routed through Termix's desktop/Electron app) tunnels. ArchNest has no desktop client (explicitly out of scope per the top of this doc), so only the S2S model was ported — a single persistent backend process manages all tunnels, same as Termix's S2S path. C2S's WebSocket data-multiplexing-to-a-desktop-client layer was not ported; it has no equivalent need in a pure web app.

What was built:

backend/src/ssh/connect.ts — extracted loadSshHost/baseConnectConfig/connectTarget (jump-host chaining + TOFU host-key verification) out of terminal.ts into a shared module, since tunnels need the exact same SSH-connection logic terminal sessions do.
backend/src/tunnels/manager.ts — in-memory tunnel runtime manager (Map<tunnelId, RuntimeState>), mirroring Termix's activeTunnels/connectionStatus maps but scoped down to this app's needs. Three modes:
- Local forward: a net.Server listens on sourcePort; each inbound connection calls client.forwardOut() to endpointHost:endpointPort and pipes the two sockets together.
- Remote forward: client.forwardIn('0.0.0.0', sourcePort) asks the SSH server to bind that port; incoming 'tcp connection' events are piped to a local net.connect() against endpointHost:endpointPort.
- Dynamic (SOCKS5): a net.Server listens on sourcePort running a minimal SOCKS5 handshake (backend/src/tunnels/socks5.ts, CONNECT-only, no-auth — sufficient for this use case, not a general SOCKS5 server), then forwardOut()s to whatever target the client requested per-connection.
- Automatic reconnection: on SSH error/close or listener bind failure, schedules a retry after retryIntervalMs, up to maxRetries, then settles into an error status (mirrors Termix's retry/backoff but simplified to a fixed interval rather than exponential — sufficient for this scale).
- startAutoStartTunnels() is called once at server boot to bring up any tunnel with autoStart set.
backend/src/routes/tunnels.ts — REST CRUD (GET/POST /api/tunnels, DELETE /api/tunnels/:id) plus POST /api/tunnels/:id/connect / /disconnect. Status (stopped/connecting/connected/retrying/error + retry count + last error) is read directly off the in-memory runtime state on every GET /api/tunnels (simple polling from the frontend every 3s — no SSE/EventSource, unlike Termix; not needed at this scale and keeps the implementation smaller).
backend/src/db/index.ts — new tunnels table: id, name, integration_id, mode, source_port, endpoint_host, endpoint_port, auto_start, max_retries, retry_interval_ms, created_at. Each tunnel references an existing SSH integrations row (no separate host table, consistent with the rest of this migration) — no separate "preset" concept needed since a tunnel row already is the saved preset.
src/pages/Tunnels.tsx — new page (/tunnels, added to the sidebar with a Waypoints icon) with a creation form (name, SSH host picker, mode, source port, endpoint host/port, auto-start) and a card grid showing each tunnel's status, mode, route, and Start/Stop/Delete actions, polling every 3 seconds.

Verified end-to-end against a real test SSH server (extending the same real-ssh2-Server + node-pty pattern used in Phase 1c) that genuinely handles tcpip (forwardOut) and tcpip-forward/cancel-tcpip-forward (forwardIn) requests, plus a real upstream TCP echo server: created one tunnel of each mode (local/remote/dynamic), connected all three, and confirmed real data flowed through each — local forward and remote forward both delivered the upstream server's banner through the tunnel, and the dynamic tunnel completed a real SOCKS5 CONNECT handshake and relayed data. Also verified disconnect correctly tears down the local listener (ECONNREFUSED after stopping). All test artifacts (test SSH server, test backend instance, test DB, tokens) were cleaned up afterward.

Phase 3 — Remote File Manager (DONE, with documented gaps)

Source: src/backend/ssh/file-manager*.ts (six files, ~3,900 lines combined: list/content/action/operation/download routes + session + utils) + frontend src/ui/features/file-manager/*.

Scope decisions:

Ephemeral SFTP connections instead of Termix's pooled/long-lived sessions: each request opens a fresh SSH+SFTP connection (backend/src/ssh/sftp.ts's withSftp()), does one operation, and tears the connection down. Simpler than managing a third long-lived connection lifecycle alongside terminal and tunnel sessions, and acceptable at this app's scale.
No sudo/permission-elevation support. Termix falls back to shell commands piped a stored sudo password when SFTP returns a permission error; not ported in this pass (no privileged remote test target available in this sandbox to verify against safely — same category of gap as the OPKSSH cert-auth gap in Phase 1c). Documented here rather than silently dropped.
No server-to-server transfer — this matches Termix's actual behavior (its own cross-host "transfer" is just sequential download then upload through the browser; same-host moves use shell mv/cp, which isn't ported since sudo isn't). Not a regression.
Whole-file-in-memory model for view/edit, same as Termix: GET/PUT /api/files/:id/content reads/writes the entire file via sftp.readFile/writeFile. Files over 50MB (MAX_EDITABLE_SIZE) are rejected with a message pointing at download/upload instead. Binary detection (so binary files are shown as a "can't edit" message rather than mangled text) uses the same heuristic as Termix: scan the first 8KB for a null byte or a >1% ratio of other control bytes.
Streaming download (GET /api/files/:id/download) for files of any size, via sftp.createReadStream() piped straight into the HTTP response rather than buffered in memory.

What was built:

backend/src/ssh/sftp.ts — withSftp(integrationId, fn): opens an ephemeral SSH+SFTP connection (reusing connect.ts's jump-host-chaining + TOFU logic from Phase 1/2), runs fn, then tears the connection down.
backend/src/routes/files.ts — GET /api/files/:id/list, GET/PUT /api/files/:id/content, POST /api/files/:id/mkdir, POST /api/files/:id/rename, POST /api/files/:id/delete, POST /api/files/:id/chmod, GET /api/files/:id/download, POST /api/files/:id/upload (multipart, via newly-added @fastify/multipart, 1GB limit).
src/pages/Files.tsx — new page (/files, sidebar entry with a FolderOpen icon): SSH host picker, breadcrumb-navigable directory browser, inline text editor for non-binary files, new-folder/rename/delete/chmod-via-octal-display/upload/download actions.

Verified end-to-end against a real filesystem-backed SFTP server built specifically for this (using ssh2's server-side low-level SFTP protocol API — genuine OPEN/READ/WRITE/READDIR/RENAME/REMOVE/MKDIR/STAT/SETSTAT handlers backed by real fs calls against a real directory on disk, not a mock). Confirmed by inspecting the actual files/permissions on disk after each operation (cat, ls, stat -c '%a'), not just the HTTP response: list, read, write, mkdir, rename, delete, chmod, upload, and download (byte-for-byte diff match against the uploaded source file) all round-tripped correctly. One real bug was caught and fixed during this verification: the download route's wrapping Promise was resolving immediately after reply.send(stream) instead of waiting for the response to actually finish, which raced Fastify into ending the HTTP response (and the route's cleanup() into closing the underlying SSH connection) before the SFTP stream had sent any data — produced a 0-byte download with a "stream closed prematurely" log line. Fixed by letting reply.send(stream)'s return value resolve the promise instead of resolving synchronously, and moving connection cleanup to the response's own finish/close events. All test artifacts (test SFTP server, test backend instance, test DB, tokens, temp files) were cleaned up afterward.

Phase 4 — Docker Container Management (DONE, with documented gaps)

Architecture decision: Termix's source (src/backend/ssh/docker.ts, docker-container-routes.ts, docker-console.ts) drives Docker over SSH+CLI. ArchNest's existing backend/src/integrations/docker.ts adapter already talks to the Docker Engine HTTP API directly via a stored baseUrl (the only config field exposed in Settings for a docker integration — no SSH credentials, no TLS client certs). Rather than bolt on a second SSH-based Docker code path, Phase 4 extends the existing Engine-API approach: all new code talks straight to dockerd's HTTP API.

What was built:

backend/src/docker/client.ts — loadDockerHost(integrationId), dockerFetch/dockerJson thin wrappers over the Engine API, demuxDockerStream() (best-effort parser for the 8-byte-frame multiplexed stdout/stderr format used by non-TTY containers' logs/stats endpoints, falling back to raw text for TTY containers).
backend/src/docker/exec.ts — openExecStream() opens a docker exec session and performs the raw HTTP "hijack": after POST /exec/{id}/start, the daemon switches the TCP socket to a raw bidirectional byte stream (no further HTTP framing), so the implementation connects via net/tls directly, writes the HTTP request by hand, and strips the response headers before treating the rest as raw I/O.
backend/src/routes/docker.ts — dockerRoutes (REST: list/stats/logs/start/stop/restart/pause/unpause/remove, behind the standard app.authenticate hook) and dockerExecRoutes (websocket /api/docker/exec, auth via a token query param verified on the connect message, mirroring terminal.ts's pattern since websocket upgrades can't carry an Authorization header).
src/pages/Containers.tsx — new page (/containers, sidebar entry with a Box icon): Docker host picker, container table (state, image, live CPU/memory from stats, ports) with start/stop/restart/pause/unpause/remove actions, a logs modal, and an exec-terminal modal reusing Terminal.tsx's xterm.js + FitAddon pattern (base64-encoded I/O over the websocket).

Verified end-to-end against a real Docker daemon (dockerd) started inside the sandbox on a TCP port, with a real container built from a docker import of the host's own rootfs (no network access to a registry was available, so a minimal real image was constructed locally rather than pulled). Confirmed via real container state transitions (docker inspect) cross-checked against the API responses: list, stats, logs (including the frame-demuxed multi-line case), start/stop/restart/pause/unpause, and remove all worked correctly through the new REST routes. The exec-terminal websocket path was exercised with a real ws client driving an interactive shell inside the real container (sent echo HELLO_FROM_EXEC, got the echoed output back through the hijacked socket) and a live resize.

One real bug was caught and fixed during this verification: openExecStream() originally called POST /exec/{id}/resize immediately after creating the exec instance but before starting it — confirmed via a raw curl repro that the Docker daemon blocks that request indefinitely until the exec's process actually exists, which hung every exec session before it ever reached ready. Fixed by passing the initial terminal size via ConsoleSize in the exec-create payload instead, and only using the explicit resize endpoint for later live resizes (sent after the exec is already running, so it's safe there, and was verified working in that position).

Documented gap: no browser is available in this sandbox, so Containers.tsx was verified by type-checking and a production vite build, and by manually exercising every backend endpoint it calls against the real daemon above — but it has not been clicked through in an actual browser. All test artifacts (test dockerd instance, test image/container, test backend instance, test DB, tokens, temp files) were cleaned up afterward.

Phase 5 — RDP/VNC/Telnet (DONE)

Architecture decision: Termix's own approach (new GuacamoleLite({ server }, ...)) attaches an unfiltered 'upgrade' listener to the whole HTTP server, which would have collided with @fastify/websocket's existing routes (/api/terminal, /api/docker/exec). Instead, guacamole-lite's lower-level ClientConnection/Crypt classes (imported directly from their CJS lib files, typed via a small ambient .d.ts) are driven from inside our own Fastify {websocket: true} route, on a socket Fastify has already upgraded — no interaction with the HTTP server's 'upgrade' event at all. guacd itself remains a required sidecar process (a real guacd binary, available via apt), but is not wired into a docker-compose.yml yet — see gap below.

What was built:

backend/src/integrations/types.ts / registry.ts / routes/integrations.ts — new remote_desktop integration type (config: protocol/hostname/port/username/domain, secret: password).
backend/src/integrations/remoteDesktop.ts — testConnection() does a raw TCP probe of the configured port (distinct from the real Guacamole-protocol tunnel below).
backend/src/routes/guacamole.ts — /api/guacamole websocket route: authenticates the token query param via app.jwt.verify (same pattern as terminal.ts/docker.ts, since websocket upgrades can't carry an Authorization header), loads the remote_desktop integration's config + decrypted secrets, server-side constructs and encrypts a Guacamole connection token via Crypt, then instantiates ClientConnection directly on the open socket and calls .connect({ host, port }) against guacd (configurable via ARCHNEST_GUACD_HOST/ARCHNEST_GUACD_PORT, default 127.0.0.1:4822). New env var ARCHNEST_GUAC_CRYPT_KEY (32-byte AES-256-CBC key) added to .env.example.
src/pages/RemoteDesktop.tsx — new page (/remote-desktop, sidebar entry with a MonitorSmartphone icon): host picker + a guacamole-common-js Guacamole.Client/Guacamole.WebSocketTunnel canvas viewer. Note: Guacamole.WebSocketTunnel appends its own "?" + data query string inside connect(), so the tunnel URL passed to its constructor must be bare, with token/integrationId passed as the string argument to client.connect(...) instead — this was caught and fixed during browser verification (see below).
src/pages/Settings.tsx — generic integration card extended with a remote_desktop entry (protocol/hostname/port/username/domain/password fields).

Verified end-to-end against real, locally-installed infrastructure (no mocking): a real guacd (v1.3.0, installed via apt) and a real Xtightvnc/vncserver desktop. A raw ws client test first confirmed the tunnel itself — JWT auth, integration lookup, token encryption, and the guacd handshake — by observing real Guacamole-protocol size/img instructions come back over the websocket. Then the actual RemoteDesktop.tsx page was exercised in a real headless Chromium (Playwright) against a real running Vite dev server + backend: logged in, navigated to /remote-desktop, selected the configured VNC host, and confirmed the UI reaches Connected state with a live VNC framebuffer (cursor visible) rendered on canvas — not just a build/typecheck pass.

One real bug was caught and fixed during this browser verification: the page initially called client.connect() with no arguments while the tunnel URL already had token=...&integrationId=... appended, producing a malformed ...&integrationId=1?undefined URL and an ECONNREFUSED-style failure. Root cause (confirmed by reading Guacamole.WebSocketTunnel's source): it always appends its own "?" + data itself. Fixed by passing a bare tunnel URL and moving the query data into the client.connect(data) call.

Documented gaps:

~~Telnet and RDP were not verified~~ (now done): with the apt mirror cooperating on a later attempt, both paths were verified end-to-end through the exact same /api/guacamole route. Telnet: ran a real inetutils-telnetd (bridged to a listening port via socat), created a remote_desktop/telnet integration, and drove the websocket — guacd logged Telnet connection successful and returned real Guacamole instructions (4.size,...). RDP: ran a real xrdp server (after installing the libguac-client-rdp0 plugin guacd needs), created a remote_desktop/rdp integration, and confirmed guacd negotiated the connection and returned a 4.size,1.0,4.1024,3.768 display surface. All three protocols (VNC from the original pass, plus telnet and RDP now) are confirmed against the identical code path. All test artifacts (guacd, telnetd/socat, xrdp, test user, test backend/DB) were cleaned up afterward.
~~guacd is not yet added to a docker-compose.yml~~ (now done): docker-compose.yml gained a guacd service (guacamole/guacd:1.5.5, no published port — only the backend reaches it on the compose network), the backend service now sets ARCHNEST_GUACD_HOST=guacd/ARCHNEST_GUACD_PORT=4822 + ARCHNEST_GUAC_CRYPT_KEY and depends_on: [guacd], and backend/.env.example documents the ARCHNEST_GUACD_* vars for local dev. Verified the compose file parses cleanly via docker compose config (the Docker daemon isn't running in this sandbox, so an actual up was not performed).
All test artifacts (test guacd/vncserver processes, test backend instance, test DB, tokens, temp files, Playwright scripts) were cleaned up afterward.

Phase 6 — Host Metrics Widgets (DONE, with documented gaps)

Architecture decision: Termix's host-metrics.ts route (2,584 lines) is tightly coupled to its own Drizzle schema, multi-user auth, SOCKS5/jump-host chaining, TOTP-gated metrics sessions, and a metrics cache/backoff/request-queue layer — none of that scaffolding was ported. The actual reusable value is the 10 widgets/*-collector.ts files: small, near-backend-agnostic functions that take a raw ssh2.Client, run a few shell commands, and return null-tolerant typed metrics. Those collectors were reimplemented against ArchNest's own ssh2 connection objects (reusing loadSshHost/connectTarget from Phase 1/2, not Termix's pool/cache/session substrate). Delivery is simple on-demand REST + 5s client-side polling — the same low-tech approach Phase 2 used for tunnel status — rather than Termix's own caching/backoff system. This was built as a new standalone page (/host-metrics) rather than folded into Infrastructure.tsx: the existing Infrastructure page is a fleet-wide overview (one row per resource), while these widgets are a deep per-host live view, closer in spirit to Terminal.tsx/RemoteDesktop.tsx's "pick a host, see one rich view" pattern. The existing backend/src/integrations/ssh.ts listResources probe (disk/mem/load percentages for the Infrastructure overview) is left as-is and unrelated — it answers "is this host healthy at a glance," not "show me everything about this host."

What was built:

backend/src/ssh/metrics/common.ts — shared execCommand() (exec + timeout + cleanup) and small numeric helpers, ported from Termix's widgets/common-utils.ts.
backend/src/ssh/metrics/{cpu,memory,disk,uptime,network,system,processes,ports,firewall,login-stats}.ts — 10 collectors ported from Termix's widgets/*-collector.ts, each independently null-safe. ports.ts only implements the ss-based path (Termix also had a netstat fallback parser, dropped as redundant on any modern target).
backend/src/ssh/metrics/index.ts — collectHostMetrics() aggregator.
backend/src/routes/metrics.ts — GET /api/integrations/:id/metrics, authenticated, connects via connectTarget (transparent jump-host support inherited for free) and runs the aggregator.
src/pages/HostMetrics.tsx — new page (/host-metrics, sidebar entry with a Gauge icon): SSH host picker + CPU/memory/disk gauges, uptime/system card, network interfaces, listening ports, top processes table, firewall summary, login activity summary. Polls every 5s while a host is selected.
src/lib/api.ts — getHostMetrics() + HostMetrics type.

Verified end-to-end against a real, locally-installed sshd (not mocked): installed openssh-server, created a real test user, ran a real ArchNest backend + SQLite DB, created a real ssh-type integration, and hit GET /api/integrations/:id/metrics over a real SSH connection. CPU, memory, disk, uptime, system, and processes all returned real, correct data from the live container (verified CPU% against /proc/stat math, memory/disk against free/df, process list against a parallel manual ps aux).

One real bug was caught and fixed: the first version ran all 10 collectors via Promise.all, which opens 15-20 concurrent SSH exec channels — this silently exceeded OpenSSH's default MaxSessions 10 and starved whichever collectors lost the race (network/processes/ports/firewall/loginStats came back empty while cpu/memory/disk/uptime/system succeeded). Fixed by running collectors sequentially in collectHostMetrics() — acceptable since this is on-demand polling, not a latency-critical path.

Follow-up verification (gaps from the first pass now closed): with iproute2 installed and a test sshd configured for root login, the three previously-unverified collectors were re-run against a real host over the real API and all returned correct data:

network → eth0 with its real IP (192.0.2.2/24) and state UP.
ports → source: "ss", 6 listening ports, with real process names and PIDs (sshd, etc.).
firewall → after adding two iptables rules (--dport 22/--dport 80 -j ACCEPT) and connecting as root, type: "iptables", status: "active", and the INPUT chain parsed back the two rules correctly.

The frontend was also browser-verified (Playwright/Chromium, now available): logged in, opened /host-metrics, selected the host, and confirmed all widgets render with real live data (CPU/memory/disk gauges, uptime, the eth0 interface, listening ports with process names, the top-processes table, the iptables firewall summary with 2 rules, and login activity) — see screenshot evidence captured during the run.

Remaining documented gap:

loginStats returned empty because the test host's wtmp had no real login history and /var/log/auth.log/secure weren't populated — last/grep both ran successfully, just had nothing to report. This is data-availability, not a code defect; unverified against a host with real login history.
All test artifacts (test sshd process, test OS users, test iptables rules, test backend instance, test DB, tokens, temp files) were cleaned up afterward.

Phase 7 — Host-to-Host File Transfer (DONE)

Architecture decision: Termix's host-transfer.ts (3,428 lines, plus transfer-paths.ts/transfer-routing.ts) is a heavily over-engineered system — parallel-segment workers, a tar-vs-per-file-SFTP method selector driven by incompressibility heuristics, hung-stream watchdogs, retry orchestration, worker caches, archive-method previews. Per the same stance taken in every prior phase, only the core value was ported: streaming a file/directory from one SSH host to another through the backend (read from the source's SFTP, write to the destination's SFTP, item by item). This is exactly the item_sftp path Termix itself falls back to in most cases; the parallel/tar/watchdog machinery is left behind as unjustified at this app's scale. Reuses ArchNest's existing connectTarget SSH helper (jump-host support inherited for free on both ends), not Termix's connection pool/session manager. Delivery mirrors Phase 2/6: an in-memory transfer registry + REST polling, no websockets.

What was built:

backend/src/ssh/transfer.ts — the transfer engine. startTransfer() returns a transferId and runs asynchronously: opens an SFTP connection to both hosts, scans the source tree up front (depth-first walk) to compute totalFiles/totalBytes for a real progress bar, recreates the directory structure on the destination, then streams each file (source createReadStream → dest createWriteStream). Tracks live progress in an in-memory activeTransfers map; supports move (deletes the source tree, files-then-dirs-deepest-first, after a successful copy) and cooperative cancellation (a flag checked between files and on every read chunk). cleanupOldTransfers() drops finished entries after an hour.
backend/src/routes/transfer.ts — POST /api/transfers (start), GET /api/transfers (list), GET /api/transfers/:id (status), POST /api/transfers/:id/cancel. All authenticated; start is zod-validated.
src/pages/Files.tsx — added a per-entry "Send to another host" action (disabled unless ≥2 SSH hosts exist) opening a modal (destination host dropdown, destination directory, move checkbox), plus a live "Host-to-Host Transfers" panel that polls (1s while any transfer is running, 5s otherwise) and shows per-transfer progress bars, current file, status, and a cancel button.
src/lib/api.ts — startTransfer/listTransfers/getTransfer/cancelTransfer + TransferProgress type.

Verified end-to-end against two real SSH endpoints (a real sshd with two real OS users as source/dest, not mocked): created two real ssh-type integrations and exercised all four behaviours over the real API:

Recursive directory copy of a tree (text file + a 100 KB random binary + a nested subdir): completed 3/3 files / 100,019 bytes; verified on disk that the directory structure was recreated, text content was intact, and the binary's md5sum matched the source exactly.
Move: a single file transferred with move:true — confirmed present on the destination and deleted from the source afterward.
Error handling: a transfer of a nonexistent source path ended status: "failed" with a clear "No such file" error rather than hanging.
Cancellation: an 80 MB transfer cancelled ~0.3 s in stopped at 162 KB with status: "cancelled" — confirming the mid-stream cancel flag actually interrupts the copy.

The frontend transfer UI was also browser-verified (Playwright/Chromium): logged in, opened the Files page, switched to a source SSH host, navigated into a directory, clicked the per-row "Send to another host" action, picked the destination host + directory in the modal, and confirmed the live "Host-to-Host Transfers" panel rendered the transfer and reached a full completed progress bar — then verified on the destination host's disk that the file actually landed with correct content.

All test artifacts (test sshd, both test OS users + their home dirs, test backend instance, test DB, temp files) were cleaned up afterward.

Phase 8 — Data Export / Import (DONE)

Architecture decision: a single-file JSON backup/restore of the user's configuration — all integrations (with their credentials), bookmark categories + bookmarks, and tunnels. Secrets are exported decrypted on purpose: that makes a backup portable to a different ArchNest instance whose ARCHNEST_SECRET_KEY differs (an encrypted export would be useless after a key change / on a fresh install). The export is only ever served to an authenticated user — the same person who can already read those secrets via the integrations they own — and the UI labels it as containing plaintext credentials. Import is additive (insert-as-new, never destructive), with old→new id remapping so tunnels and bookmarks keep pointing at their correct newly-created parents, all wrapped in a single SQLite transaction.

What was built:

backend/src/routes/data.ts — GET /api/data/export (serializes integrations+decrypted secrets, bookmark categories, bookmarks, tunnels with a version field) and POST /api/data/import (zod-validated, transactional, additive, with integrationIdMap/categoryIdMap remapping; tunnels referencing an integration absent from the import are skipped rather than orphaned).
src/lib/api.ts — exportData()/importData() + DataExport type.
src/pages/Settings.tsx — wired the previously-placeholder "Data & Backup" section to the real endpoints: Export downloads archnest-backup-<date>.json; Import reads a chosen file and POSTs it, with success/error feedback. (Replaced the old mock "Export Bookmarks"/"Clear Cache"/"Reset" buttons.)

Verified end-to-end against a real backend (not mocked): seeded an instance with an SSH integration (password + passphrase secrets), a bookmark category + bookmark, and a tunnel; then:

Export returned version: 1 with the secrets correctly decrypted to plaintext and all four entity types present.
Additive import into the same instance doubled every count, and the new tunnel's integrationId pointed at the newly-created integration (id remapping confirmed, not the stale original id).
Cross-instance portability: imported the backup into a second backend started with a completely different ARCHNEST_SECRET_KEY; re-exporting from that instance showed the credentials decrypt correctly under the new key — proving they were re-encrypted on import, which is the whole point of the decrypted-export design.
Browser-verified (Playwright/Chromium): the Settings → Data & Backup page exports a real downloaded JSON file (correct contents + success message) and imports an uploaded backup file (correct "Imported N integrations…" confirmation).

All test artifacts (two test backend instances, test DBs, downloaded backup files, temp files) were cleaned up afterward.

Also worth checking during/after the phases above

All previously-listed follow-ups are now complete: host-metrics widgets (Phase 6), host-to-host transfer (Phase 7), and data export/import (Phase 8) are done, and the verification gaps noted in Phases 1, 5, and 6 have been closed (cert auth, Telnet, RDP, guacd compose wiring, host-metrics network/ports/firewall + browser UI, and the Phase 1b/7 UI click-throughs).

Tracking

Update the phase status lines above as work lands. Each phase should get its own commit(s) on claude/wonderful-faraday-qxym5t, following the existing commit message style (descriptive title + why, Co-Authored-By/Claude-Session trailer).

42 KiB Raw Permalink Blame History Unescape Escape

Termix → ArchNest Migration Plan

Decision: why merge into ArchNest's backend, not Termix's

What is explicitly NOT being ported (user-approved tradeoff)

Phases

Phase 1 — SSH Terminal (DONE)

Phase 2 — SSH Tunnels (DONE)

Phase 3 — Remote File Manager (DONE, with documented gaps)

Phase 4 — Docker Container Management (DONE, with documented gaps)

Phase 5 — RDP/VNC/Telnet (DONE)

Phase 6 — Host Metrics Widgets (DONE, with documented gaps)

Phase 7 — Host-to-Host File Transfer (DONE)

Phase 8 — Data Export / Import (DONE)

Also worth checking during/after the phases above

Tracking

42 KiB

Raw Permalink Blame History