diff --git a/ROADMAP.md b/ROADMAP.md index 39dae43..23f18f4 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -70,6 +70,72 @@ arbitrary pane count. --- +## LXC container management (Proxmox) — PAID ADD-ON + +**Status:** not built; planned as a paid-tier feature. + +ArchNest currently has full **Docker** container management (the Containers +page: list/start/stop/restart/pause/remove, logs, interactive exec — backed +by `backend/src/routes/docker.ts` + `backend/src/docker/`). There is **no LXC +equivalent**. + +The only place LXC could surface today is the Proxmox integration's +`listResources()` (`backend/src/integrations/proxmox.ts`), and it currently +queries `/api2/json/cluster/resources?type=vm` — i.e. **QEMU VMs only**, so +Proxmox LXC containers (`type=lxc`) are not even listed. + +Planned scope (paid tier): +- **List** LXC guests alongside VMs (drop/relax the `type=vm` filter, or also + fetch `type=lxc`, and label them in the resource grid). +- **Lifecycle** management via Proxmox's per-node LXC API + (`POST /api2/json/nodes/{node}/lxc/{vmid}/status/{start|stop|shutdown}`) — + a new route group + `api.ts` entries + UI, mirroring the Docker Containers + page. +- **Console/shell** into an LXC guest via the Proxmox console/ticket API + (more involved than Docker exec — separate auth/ticket flow). + +Note: the read-only "list LXC in the resource grid" piece is small and +arguably a bug fix (the Proxmox integration silently hides half a cluster's +guests today); if the user later wants just that part in the free tier, it +can be split out from this paid add-on. + +--- + +## Docker monitoring agent — tiered (push self-hosted / pull paid) + +ArchNest can manage Docker containers two ways today: the Docker Engine TCP +integration (`backend/src/docker/`) and "Docker over SSH" (runs the `docker` +CLI on a remote SSH host — `backend/src/ssh/docker.ts`, +`backend/src/routes/dockerSsh.ts`). Both are **pull** models where ArchNest +reaches into the host. + +A complementary **agent** model is planned, split across tiers: + +### Self-hosted — Option 1: push agent (monitoring) — IN PROGRESS +- A lightweight script dropped on each Docker VM (bash + `docker` CLI + curl) + collects `docker ps` (+ optional per-container stats) and **POSTs** a JSON + report to an ArchNest ingest endpoint on a timer (cron/systemd). +- VMs need **outbound-only** access to ArchNest over the mesh — no exposed + port, no SSH, no dockerd socket. Cleanest security story for the free tier. +- ArchNest stores the latest report per host and surfaces it as a read-only + monitoring view / Infrastructure resource source. +- **Monitoring only** — a one-way push cannot perform actions. Management on + self-hosted continues to use the existing **Docker-over-SSH** path on + demand, so nothing is removed: push = constant monitoring (zero exposure), + SSH = occasional management action. + +### Paid — Option 2: pull agent with local API (monitor + manage) +- A small **authenticated HTTP service** runs on each VM, bound to its mesh + IP, exposing a thin, locked-down wrapper over the Docker socket + (`/containers`, `/logs`, lifecycle actions, exec). +- ArchNest **pulls** on demand — supports both monitoring and management + through one uniform mechanism, with real per-agent auth (which the raw + dockerd TCP socket lacks). +- Tradeoff: exposes a (locked-down, authenticated) port on each VM, and is a + service to run/secure — hence gated to the paid tier. + +--- + ## Known non-blocking stubs (cosmetic, not scheduled) Not flagged as work to do unless explicitly asked: diff --git a/agent/README.md b/agent/README.md new file mode 100644 index 0000000..302a361 --- /dev/null +++ b/agent/README.md @@ -0,0 +1,125 @@ +# ArchNest Docker monitoring agent + +A small push agent that reports this host's Docker containers to ArchNest. See +the design in [`docs/docker-agent-monitoring.md`](../docs/docker-agent-monitoring.md). + +It is **monitoring only** — it pushes data outbound to ArchNest and never +receives or runs commands. Container management stays on ArchNest's +Docker-over-SSH / Docker API paths. + +## Requirements + +`bash`, `docker`, `curl`, `jq`. Install `jq` if missing: + +```bash +# Debian/Ubuntu +sudo apt-get install -y jq +# RHEL/Alma/Rocky +sudo dnf install -y jq +# Alpine +sudo apk add jq +``` + +The user running the agent must be able to run `docker` (in the `docker` group +or via root). + +## Install + +1. Copy the script onto the VM and make it executable: + + ```bash + sudo install -m 0755 archnest-docker-agent.sh /usr/local/bin/archnest-docker-agent + ``` + +2. Create the config file (keep it root-only — it holds the token): + + ```bash + sudo mkdir -p /etc/archnest + sudo tee /etc/archnest/agent.env >/dev/null <<'EOF' + ARCHNEST_URL=http://:4000 + ARCHNEST_AGENT_TOKEN= + ARCHNEST_HOST_ID=proxmox-vm-1 + # ARCHNEST_HOSTNAME=docker01 # optional; defaults to `hostname` + EOF + sudo chmod 600 /etc/archnest/agent.env + ``` + + `ARCHNEST_URL` must point at the ArchNest backend over your **mesh / private + network**, never a public address — the ingest endpoint is protected only by + the shared token at the application layer. + +3. Run it once to verify: + + ```bash + sudo archnest-docker-agent + # -> "reported N container(s) as 'proxmox-vm-1' (HTTP 200)" + ``` + +## Schedule it (pick one) + +Report interval should be **shorter than the backend's stale window** +(`ARCHNEST_AGENT_STALE_MS`, default 90s). 30s is a good default. + +### Option A — cron (every minute; simplest) + +```cron +* * * * * root /usr/local/bin/archnest-docker-agent >/dev/null 2>&1 +``` + +(cron's finest granularity is 1 minute; raise `ARCHNEST_AGENT_STALE_MS` to e.g. +150000 on the backend if you use a 1-minute cron.) + +### Option B — systemd service + timer (recommended; supports 30s) + +`/etc/systemd/system/archnest-docker-agent.service`: + +```ini +[Unit] +Description=ArchNest Docker monitoring agent +After=docker.service +Wants=docker.service + +[Service] +Type=oneshot +EnvironmentFile=/etc/archnest/agent.env +ExecStart=/usr/local/bin/archnest-docker-agent +``` + +`/etc/systemd/system/archnest-docker-agent.timer`: + +```ini +[Unit] +Description=Run ArchNest Docker monitoring agent every 30s + +[Timer] +OnBootSec=30 +OnUnitActiveSec=30 +AccuracySec=5s + +[Install] +WantedBy=timers.target +``` + +Enable: + +```bash +sudo systemctl daemon-reload +sudo systemctl enable --now archnest-docker-agent.timer +sudo systemctl list-timers archnest-docker-agent.timer # confirm scheduling +journalctl -u archnest-docker-agent.service -n 20 # see last run output +``` + +## Backend configuration + +The backend must have `ARCHNEST_AGENT_TOKEN` set (the same value as the agent). +If it is unset, the ingest endpoint is disabled and returns HTTP 503. Optional: +`ARCHNEST_AGENT_STALE_MS` (default 90000) controls when a host is shown stale. + +## Security notes + +- The token is a credential — treat `/etc/archnest/agent.env` as sensitive + (`chmod 600`, root-owned). +- The agent masks env var values whose key matches + `PASS|SECRET|TOKEN|KEY|PRIVATE|CREDENTIAL` before sending; the full values + never leave the VM. +- Expose the ArchNest ingest endpoint on the mesh only, not the public internet. diff --git a/agent/archnest-docker-agent.sh b/agent/archnest-docker-agent.sh new file mode 100644 index 0000000..1c4d15f --- /dev/null +++ b/agent/archnest-docker-agent.sh @@ -0,0 +1,175 @@ +#!/usr/bin/env bash +# +# ArchNest Docker monitoring agent (self-hosted, push model). +# +# Collects a rich snapshot of this host's Docker containers (docker ps + +# docker inspect + a docker stats snapshot) and POSTs it to ArchNest. ArchNest +# stores the latest report per host and shows it read-only on the Containers +# page. This is MONITORING ONLY — it never receives or runs commands. +# +# Requirements: bash, docker, curl, jq. +# +# Configuration (env vars; may live in /etc/archnest/agent.env): +# ARCHNEST_URL Base URL of the ArchNest backend, reachable over your +# mesh / private network, e.g. http://100.64.0.5:4000 +# ARCHNEST_AGENT_TOKEN Shared token; must match the backend's ARCHNEST_AGENT_TOKEN. +# ARCHNEST_HOST_ID Stable id for this host, e.g. "proxmox-vm-1" +# (allowed: letters, digits, . _ - ; max 128 chars). +# ARCHNEST_HOSTNAME Optional display hostname (defaults to `hostname`). +# +# Exit codes: 0 ok, 1 misconfig/missing deps, 2 report POST failed. + +set -euo pipefail + +AGENT_VERSION="1" + +# Load config file if present (does not override already-exported env). +if [ -f /etc/archnest/agent.env ]; then + # shellcheck disable=SC1091 + . /etc/archnest/agent.env +fi + +err() { echo "archnest-docker-agent: $*" >&2; } + +# --- Dependency + config checks ------------------------------------------- +for bin in docker curl jq; do + if ! command -v "$bin" >/dev/null 2>&1; then + err "missing required dependency: $bin" + exit 1 + fi +done + +: "${ARCHNEST_URL:?ARCHNEST_URL is required}" +: "${ARCHNEST_AGENT_TOKEN:?ARCHNEST_AGENT_TOKEN is required}" +: "${ARCHNEST_HOST_ID:?ARCHNEST_HOST_ID is required}" +HOSTNAME_VALUE="${ARCHNEST_HOSTNAME:-$(hostname)}" + +if ! printf '%s' "$ARCHNEST_HOST_ID" | grep -Eq '^[A-Za-z0-9][A-Za-z0-9._-]{0,127}$'; then + err "ARCHNEST_HOST_ID '$ARCHNEST_HOST_ID' is invalid (allowed: A-Z a-z 0-9 . _ - , max 128)" + exit 1 +fi + +REPORT_URL="${ARCHNEST_URL%/}/api/agents/docker/report" + +# --- Collect container ids ------------------------------------------------- +mapfile -t IDS < <(docker ps --all --no-trunc --format '{{.ID}}') + +# --- Stats snapshot (one shot) keyed by full id ---------------------------- +# `docker stats` reports a short id; we map short->full via the ids list. +# Build a jq object: { "": {cpu,mem,...} }. +STATS_JSON="$(docker stats --no-stream --no-trunc \ + --format '{{.ID}}|{{.CPUPerc}}|{{.MemUsage}}|{{.NetIO}}|{{.BlockIO}}' 2>/dev/null \ + | jq -R -s ' + def bytes: + # converts "12.3MiB" / "1.2GB" etc to a number of bytes + capture("(?[0-9.]+)\\s*(?[A-Za-z]*)") as $m + | ($m.n | tonumber) as $n + | ($m.u | ascii_downcase) as $u + | $n * ( + if $u|startswith("ki") then 1024 + elif $u|startswith("mi") then 1048576 + elif $u|startswith("gi") then 1073741824 + elif $u|startswith("ti") then 1099511627776 + elif $u|startswith("kb") or $u=="k" then 1000 + elif $u|startswith("mb") or $u=="m" then 1000000 + elif $u|startswith("gb") or $u=="g" then 1000000000 + elif $u|startswith("tb") or $u=="t" then 1000000000000 + elif $u|startswith("b") or $u=="" then 1 + else 1 end + ) | floor; + split("\n") | map(select(length > 0)) | map(split("|")) | map({ + key: .[0], + value: { + cpuPercent: (.[1] | gsub("%";"") | tonumber? // 0), + memUsage: (.[2] | split("/")[0] | gsub(" ";"") | (try bytes catch 0)), + memLimit: (.[2] | split("/")[1] | gsub(" ";"") | (try bytes catch 0)), + netRxBytes: (.[3] | split("/")[0] | gsub(" ";"") | (try bytes catch 0)), + netTxBytes: (.[3] | split("/")[1] | gsub(" ";"") | (try bytes catch 0)), + blockReadBytes: (.[4] | split("/")[0] | gsub(" ";"") | (try bytes catch 0)), + blockWriteBytes: (.[4] | split("/")[1] | gsub(" ";"") | (try bytes catch 0)) + } + }) | from_entries + ')" +[ -z "$STATS_JSON" ] && STATS_JSON='{}' + +# --- Per-container detail from docker inspect ------------------------------ +# jq transform turning one inspect object into our report schema, masking +# secret-looking env values. +INSPECT_FILTER=' + def mask($k): ($k | ascii_upcase) as $u + | ($u | test("PASS|SECRET|TOKEN|KEY|PRIVATE|CREDENTIAL")); + .[0] as $c + | { + id: $c.Id, + name: ($c.Name // "" | ltrimstr("/")), + image: ($c.Config.Image // ""), + imageId: ($c.Image // ""), + state: ($c.State.Status // "unknown"), + status: ($c.State.Status // ""), + createdAt: ($c.Created // null), + startedAt: ($c.State.StartedAt // null), + restartCount: ($c.RestartCount // 0), + restartPolicy: ($c.HostConfig.RestartPolicy.Name // ""), + health: ($c.State.Health.Status // "none"), + ports: ( + ($c.NetworkSettings.Ports // {}) | to_entries | map( + (.key | split("/")) as $p + | (.value // [])[]? as $b + | { hostIp: ($b.HostIp // ""), hostPort: ($b.HostPort | tonumber? // null), + containerPort: ($p[0] | tonumber? // 0), proto: ($p[1] // "tcp") } + ) + ), + networks: ( + ($c.NetworkSettings.Networks // {}) | to_entries + | map({ name: .key, ip: (.value.IPAddress // "") }) + ), + mounts: ( + ($c.Mounts // []) | map({ + type: (.Type // ""), source: (.Source // .Name // ""), + destination: (.Destination // ""), rw: (.RW // true) + }) + ), + env: ( + ($c.Config.Env // []) | map( + (. | split("=")) as $kv + | { key: $kv[0], value: (if mask($kv[0]) then "********" else ($kv[1:] | join("=")) end) } + ) + ), + command: (($c.Config.Entrypoint // []) + ($c.Config.Cmd // []) | join(" ")), + labels: ($c.Config.Labels // {}) + } +' + +CONTAINERS='[]' +for id in "${IDS[@]}"; do + [ -z "$id" ] && continue + detail="$(docker inspect "$id" 2>/dev/null | jq -c "$INSPECT_FILTER" 2>/dev/null || true)" + [ -z "$detail" ] && continue + short="${id:0:12}" + # Attach the matching stats snapshot (match by full or short id). + detail="$(jq -c --argjson stats "$STATS_JSON" --arg id "$id" --arg short "$short" \ + '. + { stats: ($stats[$id] // $stats[$short] // null) }' <<<"$detail")" + CONTAINERS="$(jq -c --argjson c "$detail" '. + [$c]' <<<"$CONTAINERS")" +done + +# --- Assemble + POST ------------------------------------------------------- +PAYLOAD="$(jq -n \ + --arg hostId "$ARCHNEST_HOST_ID" \ + --arg hostname "$HOSTNAME_VALUE" \ + --arg agentVersion "$AGENT_VERSION" \ + --arg reportedAt "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + --argjson containers "$CONTAINERS" \ + '{ hostId: $hostId, hostname: $hostname, agentVersion: $agentVersion, reportedAt: $reportedAt, containers: $containers }')" + +HTTP_CODE="$(curl -s -o /dev/null -w '%{http_code}' \ + -X POST "$REPORT_URL" \ + -H 'Content-Type: application/json' \ + -H "Authorization: Bearer ${ARCHNEST_AGENT_TOKEN}" \ + --data-binary "$PAYLOAD" || echo "000")" + +if [ "$HTTP_CODE" != "200" ]; then + err "report POST to $REPORT_URL failed (HTTP $HTTP_CODE)" + exit 2 +fi + +echo "archnest-docker-agent: reported ${#IDS[@]} container(s) as '$ARCHNEST_HOST_ID' (HTTP $HTTP_CODE)" diff --git a/backend/src/db/index.ts b/backend/src/db/index.ts index 3c5d713..ac264c6 100644 --- a/backend/src/db/index.ts +++ b/backend/src/db/index.ts @@ -104,6 +104,14 @@ db.exec(` success INTEGER NOT NULL, created_at TEXT NOT NULL DEFAULT (datetime('now')) ); + + CREATE TABLE IF NOT EXISTS docker_agent_reports ( + host_id TEXT PRIMARY KEY, + hostname TEXT, + report_json TEXT NOT NULL, + reported_at TEXT, + received_at TEXT NOT NULL DEFAULT (datetime('now')) + ); `) export function logEvent(type: string, title: string, source?: string | null) { diff --git a/backend/src/routes/agents.ts b/backend/src/routes/agents.ts new file mode 100644 index 0000000..3f34022 --- /dev/null +++ b/backend/src/routes/agents.ts @@ -0,0 +1,200 @@ +import type { FastifyInstance, FastifyRequest } from 'fastify' +import { timingSafeEqual } from 'node:crypto' +import { z } from 'zod' +import { db } from '../db/index.js' + +/** + * Docker monitoring agents (self-hosted, push model). Agents on each Docker VM + * POST a rich container report here; ArchNest stores the latest per host and + * serves it read-only to the UI. See docs/docker-agent-monitoring.md. + * + * Ingest is gated by a shared ARCHNEST_AGENT_TOKEN (NOT the user JWT). Read + * endpoints are behind the normal user authenticate hook. + */ + +// Reports older than this (by server receive time) are flagged stale. +const STALE_AFTER_MS = Number(process.env.ARCHNEST_AGENT_STALE_MS) || 90_000 + +const HOST_ID_RE = /^[A-Za-z0-9][A-Za-z0-9._-]{0,127}$/ + +const portSchema = z.object({ + hostIp: z.string().optional(), + hostPort: z.number().int().nullable().optional(), + containerPort: z.number().int(), + proto: z.string(), +}) + +const containerSchema = z.object({ + id: z.string().min(1).max(128), + name: z.string().max(256), + image: z.string().max(512).default(''), + imageId: z.string().max(256).optional(), + state: z.string().max(32).default('unknown'), + status: z.string().max(256).default(''), + createdAt: z.string().max(64).optional(), + startedAt: z.string().max(64).optional(), + restartCount: z.number().int().optional(), + restartPolicy: z.string().max(64).optional(), + health: z.string().max(32).optional(), + ports: z.array(portSchema).max(200).default([]), + networks: z.array(z.object({ name: z.string(), ip: z.string().optional() })).max(50).default([]), + mounts: z + .array( + z.object({ + type: z.string().optional(), + source: z.string().optional(), + destination: z.string().optional(), + rw: z.boolean().optional(), + }), + ) + .max(100) + .default([]), + env: z.array(z.object({ key: z.string(), value: z.string() })).max(500).default([]), + command: z.string().max(2048).optional(), + labels: z.record(z.string(), z.string()).optional(), + stats: z + .object({ + cpuPercent: z.number().optional(), + memUsage: z.number().optional(), + memLimit: z.number().optional(), + netRxBytes: z.number().optional(), + netTxBytes: z.number().optional(), + blockReadBytes: z.number().optional(), + blockWriteBytes: z.number().optional(), + }) + .optional(), +}) + +const reportSchema = z.object({ + hostId: z.string().regex(HOST_ID_RE, 'Invalid hostId'), + hostname: z.string().max(256).optional(), + agentVersion: z.string().max(32).optional(), + reportedAt: z.string().max(64).optional(), + containers: z.array(containerSchema).max(1000), +}) + +export type AgentReport = z.infer +export type AgentContainer = z.infer + +interface ReportRow { + host_id: string + hostname: string | null + report_json: string + reported_at: string | null + received_at: string +} + +/** Constant-time bearer-token check against ARCHNEST_AGENT_TOKEN. */ +function agentTokenValid(req: FastifyRequest): { ok: boolean; configured: boolean } { + const expected = process.env.ARCHNEST_AGENT_TOKEN + if (!expected) return { ok: false, configured: false } + const header = req.headers.authorization ?? '' + const presented = header.startsWith('Bearer ') ? header.slice(7) : '' + const a = Buffer.from(presented) + const b = Buffer.from(expected) + if (a.length !== b.length) return { ok: false, configured: true } + return { ok: timingSafeEqual(a, b), configured: true } +} + +function isStale(receivedAt: string): boolean { + const t = Date.parse(receivedAt.replace(' ', 'T') + 'Z') + if (Number.isNaN(t)) return false + return Date.now() - t > STALE_AFTER_MS +} + +/** Token-gated ingest. Registered separately so it is NOT behind the user-auth hook. */ +export async function agentIngestRoutes(app: FastifyInstance) { + app.post('/api/agents/docker/report', async (req, reply) => { + const auth = agentTokenValid(req) + if (!auth.configured) { + return reply.code(503).send({ error: 'Agent ingest is disabled (ARCHNEST_AGENT_TOKEN not configured)' }) + } + if (!auth.ok) { + return reply.code(401).send({ error: 'Unauthorized' }) + } + const parsed = reportSchema.safeParse(req.body) + if (!parsed.success) { + return reply.code(400).send({ error: parsed.error.issues[0]?.message ?? 'Invalid report' }) + } + const { hostId, hostname, reportedAt, containers } = parsed.data + db.prepare( + `INSERT INTO docker_agent_reports (host_id, hostname, report_json, reported_at, received_at) + VALUES (?, ?, ?, ?, datetime('now')) + ON CONFLICT(host_id) DO UPDATE SET + hostname = excluded.hostname, + report_json = excluded.report_json, + reported_at = excluded.reported_at, + received_at = datetime('now')`, + ).run(hostId, hostname ?? null, JSON.stringify(containers), reportedAt ?? null) + return { ok: true } + }) +} + +/** Read-only query endpoints, behind the user authenticate hook. */ +export async function agentRoutes(app: FastifyInstance) { + app.addHook('onRequest', app.authenticate) + + app.get('/api/agents/docker/hosts', async () => { + const rows = db + .prepare('SELECT host_id, hostname, report_json, reported_at, received_at FROM docker_agent_reports ORDER BY host_id') + .all() as ReportRow[] + return { + hosts: rows.map((r) => { + let count = 0 + try { + count = (JSON.parse(r.report_json) as unknown[]).length + } catch { + count = 0 + } + return { + hostId: r.host_id, + hostname: r.hostname, + reportedAt: r.reported_at, + receivedAt: r.received_at, + containerCount: count, + stale: isStale(r.received_at), + } + }), + } + }) + + app.get('/api/agents/docker/hosts/:hostId/containers', async (req, reply) => { + const hostId = (req.params as { hostId: string }).hostId + const row = db + .prepare('SELECT host_id, hostname, report_json, reported_at, received_at FROM docker_agent_reports WHERE host_id = ?') + .get(hostId) as ReportRow | undefined + if (!row) return reply.code(404).send({ error: 'Host not reported' }) + let containers: AgentContainer[] = [] + try { + containers = JSON.parse(row.report_json) as AgentContainer[] + } catch { + containers = [] + } + return { + hostId: row.host_id, + hostname: row.hostname, + reportedAt: row.reported_at, + receivedAt: row.received_at, + stale: isStale(row.received_at), + containers, + } + }) + + app.get('/api/agents/docker/hosts/:hostId/containers/:containerId', async (req, reply) => { + const { hostId, containerId } = req.params as { hostId: string; containerId: string } + const row = db + .prepare('SELECT report_json FROM docker_agent_reports WHERE host_id = ?') + .get(hostId) as { report_json: string } | undefined + if (!row) return reply.code(404).send({ error: 'Host not reported' }) + let containers: AgentContainer[] = [] + try { + containers = JSON.parse(row.report_json) as AgentContainer[] + } catch { + containers = [] + } + // Match by full id or a unique short-id prefix. + const container = containers.find((c) => c.id === containerId || c.id.startsWith(containerId)) + if (!container) return reply.code(404).send({ error: 'Container not found in latest report' }) + return { container } + }) +} diff --git a/backend/src/routes/dockerSsh.ts b/backend/src/routes/dockerSsh.ts new file mode 100644 index 0000000..3208d15 --- /dev/null +++ b/backend/src/routes/dockerSsh.ts @@ -0,0 +1,175 @@ +import type { FastifyInstance } from 'fastify' +import { Client, type ClientChannel } from 'ssh2' +import { z } from 'zod' +import { loadSshHost, connectTarget } from '../ssh/connect.js' +import { + withSshClient, + listContainers, + containerLogs, + containerAction, + removeContainer, + buildExecShellCommand, + isDockerAction, + isValidContainerRef, +} from '../ssh/docker.js' + +function sendJson(socket: { send: (data: string) => void }, payload: Record) { + socket.send(JSON.stringify(payload)) +} + +interface ExecMessage { + type: 'connect' | 'input' | 'resize' | 'disconnect' + integrationId?: number + containerId?: string + cols?: number + rows?: number + data?: string +} + +/** + * "Docker over SSH" REST routes. These target an SSH integration (not the + * Docker TCP integration) and shell out to the `docker` CLI on the remote host. + * Mutating actions are admin-only, matching the policy for the TCP Docker routes + * and the rest of the shared-config surface. + */ +export async function dockerSshRoutes(app: FastifyInstance) { + app.addHook('onRequest', app.authenticate) + + app.get('/api/docker-ssh/:integrationId/containers', async (req, reply) => { + const integrationId = Number((req.params as { integrationId: string }).integrationId) + const result = await withSshClient(integrationId, (client) => listContainers(client)) + if (!result.ok) return reply.code(502).send({ error: result.error }) + return { containers: result.value } + }) + + app.get('/api/docker-ssh/:integrationId/containers/:id/logs', async (req, reply) => { + const { integrationId, id } = req.params as { integrationId: string; id: string } + if (!isValidContainerRef(id)) return reply.code(400).send({ error: 'Invalid container reference' }) + const tail = Number((req.query as { tail?: string }).tail ?? '200') + const result = await withSshClient(Number(integrationId), (client) => containerLogs(client, id, tail)) + if (!result.ok) return reply.code(502).send({ error: result.error }) + return { logs: result.value } + }) + + app.post('/api/docker-ssh/:integrationId/containers/:id/:action', { onRequest: [app.adminOnly] }, async (req, reply) => { + const { integrationId, id, action } = req.params as { integrationId: string; id: string; action: string } + if (!isValidContainerRef(id)) return reply.code(400).send({ error: 'Invalid container reference' }) + if (!isDockerAction(action)) return reply.code(400).send({ error: 'Invalid action' }) + const result = await withSshClient(Number(integrationId), (client) => containerAction(client, id, action)) + if (!result.ok) return reply.code(502).send({ error: result.error }) + return { ok: true } + }) + + app.post('/api/docker-ssh/:integrationId/containers/:id/remove', { onRequest: [app.adminOnly] }, async (req, reply) => { + const { integrationId, id } = req.params as { integrationId: string; id: string } + if (!isValidContainerRef(id)) return reply.code(400).send({ error: 'Invalid container reference' }) + const parsed = z.object({ force: z.boolean().default(false) }).safeParse(req.body ?? {}) + if (!parsed.success) return reply.code(400).send({ error: 'Invalid input' }) + const result = await withSshClient(Number(integrationId), (client) => removeContainer(client, id, parsed.data.force)) + if (!result.ok) return reply.code(502).send({ error: result.error }) + return { ok: true } + }) +} + +/** + * Interactive `docker exec` shell over a PTY, wired to a WebSocket. Models the + * terminal route's plumbing but runs the exec command on the SSH host instead + * of opening a login shell. + */ +export async function dockerSshExecRoutes(app: FastifyInstance) { + app.get('/api/docker-ssh/exec', { websocket: true }, (socket, req) => { + let conn: Client | null = null + let jumpConn: Client | null = null + let stream: ClientChannel | null = null + + const cleanup = () => { + stream?.end() + conn?.end() + jumpConn?.end() + stream = null + conn = null + jumpConn = null + } + socket.on('close', cleanup) + + socket.on('message', async (raw: Buffer) => { + let msg: ExecMessage + try { + msg = JSON.parse(raw.toString()) + } catch { + sendJson(socket, { type: 'error', message: 'Invalid JSON' }) + return + } + + if (msg.type === 'connect') { + const query = req.query as { token?: string } + try { + await app.jwt.verify(query.token ?? '') + } catch { + sendJson(socket, { type: 'error', message: 'Unauthorized' }) + socket.close() + return + } + + const target = msg.integrationId !== undefined ? loadSshHost(msg.integrationId) : null + if (!target) { + sendJson(socket, { type: 'error', message: 'SSH integration not found' }) + return + } + if (!msg.containerId || !isValidContainerRef(msg.containerId)) { + sendJson(socket, { type: 'error', message: 'Invalid container reference' }) + return + } + + let command: string + try { + command = buildExecShellCommand(msg.containerId) + } catch (err) { + sendJson(socket, { type: 'error', message: err instanceof Error ? err.message : 'Invalid container reference' }) + return + } + + const cols = msg.cols ?? 80 + const rows = msg.rows ?? 24 + + const startSession = (client: Client) => { + conn = client + client.exec(command, { pty: { cols, rows, term: 'xterm-256color' } }, (err, ch) => { + if (err) { + sendJson(socket, { type: 'error', message: err.message }) + return + } + stream = ch + sendJson(socket, { type: 'ready' }) + ch.on('data', (chunk: Buffer) => sendJson(socket, { type: 'data', data: chunk.toString('utf8') })) + ch.stderr.on('data', (chunk: Buffer) => sendJson(socket, { type: 'data', data: chunk.toString('utf8') })) + ch.on('close', () => { + sendJson(socket, { type: 'exit' }) + cleanup() + }) + }) + } + + const result = connectTarget(target, startSession, (message) => sendJson(socket, { type: 'error', message })) + conn = result.conn + jumpConn = result.jumpConn + return + } + + if (msg.type === 'input') { + stream?.write(msg.data ?? '') + return + } + + if (msg.type === 'resize') { + stream?.setWindow(msg.rows ?? 24, msg.cols ?? 80, 0, 0) + return + } + + if (msg.type === 'disconnect') { + cleanup() + socket.close() + } + }) + }) +} diff --git a/backend/src/server.ts b/backend/src/server.ts index 7ec3d95..7d4653b 100644 --- a/backend/src/server.ts +++ b/backend/src/server.ts @@ -12,6 +12,8 @@ import { terminalRoutes } from './routes/terminal.js' import { tunnelRoutes } from './routes/tunnels.js' import { fileRoutes } from './routes/files.js' import { dockerRoutes, dockerExecRoutes } from './routes/docker.js' +import { dockerSshRoutes, dockerSshExecRoutes } from './routes/dockerSsh.js' +import { agentIngestRoutes, agentRoutes } from './routes/agents.js' import { guacamoleRoutes } from './routes/guacamole.js' import { metricsRoutes } from './routes/metrics.js' import { transferRoutes } from './routes/transfer.js' @@ -89,6 +91,10 @@ await app.register(tunnelRoutes) await app.register(fileRoutes) await app.register(dockerRoutes) await app.register(dockerExecRoutes) +await app.register(dockerSshRoutes) +await app.register(dockerSshExecRoutes) +await app.register(agentIngestRoutes) +await app.register(agentRoutes) await app.register(guacamoleRoutes) await app.register(metricsRoutes) await app.register(transferRoutes) diff --git a/backend/src/ssh/docker.ts b/backend/src/ssh/docker.ts new file mode 100644 index 0000000..3d54556 --- /dev/null +++ b/backend/src/ssh/docker.ts @@ -0,0 +1,152 @@ +import { Client } from 'ssh2' +import { connectTarget, loadSshHost, type SshHost } from './connect.js' +import { execCommand } from './metrics/common.js' + +/** + * "Docker over SSH": instead of talking to the Docker Engine TCP API, run the + * `docker` CLI on a remote SSH host. This reuses the existing SSH transport + * (jump-host chaining, host-key verification, cert/key/password auth) so no + * dockerd TCP socket has to be exposed — the mesh + SSH auth are the gate. + * + * Container ids/names come from the client and are interpolated into shell + * commands, so every one is validated against this strict allowlist and passed + * single-quoted. Anything outside this set is rejected before a command runs. + */ +const CONTAINER_REF_RE = /^[A-Za-z0-9][A-Za-z0-9_.-]{0,127}$/ + +export function isValidContainerRef(ref: string): boolean { + return CONTAINER_REF_RE.test(ref) +} + +/** Single-quote a value for safe use as one shell argument. */ +function shQuote(value: string): string { + return `'${value.replace(/'/g, `'\\''`)}'` +} + +export const DOCKER_ACTIONS = ['start', 'stop', 'restart', 'pause', 'unpause'] as const +export type DockerAction = (typeof DOCKER_ACTIONS)[number] + +export function isDockerAction(value: string): value is DockerAction { + return (DOCKER_ACTIONS as readonly string[]).includes(value) +} + +export interface SshContainer { + id: string + name: string + image: string + state: string + status: string + ports: string +} + +/** + * Connects to the SSH host for `integrationId`, runs `fn` with a ready ssh2 + * Client, and tears the connection (and any jump host) down afterwards. + * Mirrors the connect-once pattern used by the metrics route. + */ +export async function withSshClient( + integrationId: number, + fn: (client: Client) => Promise, +): Promise<{ ok: true; value: T } | { ok: false; error: string }> { + const target: SshHost | null = loadSshHost(integrationId) + if (!target) return { ok: false, error: 'SSH integration not found' } + + const jumpRef: { current: Client | null } = { current: null } + const client = await new Promise((resolve) => { + const { jumpConn } = connectTarget( + target, + (c) => resolve(c), + () => { + jumpConn?.end() + resolve(null) + }, + ) + jumpRef.current = jumpConn + }) + + if (!client) return { ok: false, error: 'Failed to connect to host' } + + try { + const value = await fn(client) + return { ok: true, value } + } catch (err) { + return { ok: false, error: err instanceof Error ? err.message : 'Command failed' } + } finally { + client.end() + jumpRef.current?.end() + } +} + +/** + * Lists containers via `docker ps`. Uses `--format '{{json .}}'` which emits one + * JSON object per line (the documented stable CLI format), avoiding fragile + * column parsing. + */ +export async function listContainers(client: Client): Promise { + const { stdout, stderr, code } = await execCommand( + client, + "docker ps --all --no-trunc --format '{{json .}}'", + ) + if (code !== 0) { + throw new Error(stderr.trim() || `docker ps exited with code ${code}`) + } + const containers: SshContainer[] = [] + for (const line of stdout.split('\n')) { + const trimmed = line.trim() + if (!trimmed) continue + try { + const row = JSON.parse(trimmed) as Record + containers.push({ + id: row.ID ?? '', + name: row.Names ?? '', + image: row.Image ?? '', + state: row.State ?? '', + status: row.Status ?? '', + ports: row.Ports ?? '', + }) + } catch { + // Skip any line that isn't valid JSON rather than failing the whole list. + } + } + return containers +} + +export async function containerLogs(client: Client, ref: string, tail: number): Promise { + if (!isValidContainerRef(ref)) throw new Error('Invalid container reference') + const safeTail = Number.isFinite(tail) && tail > 0 ? Math.min(Math.floor(tail), 5000) : 200 + const { stdout, stderr, code } = await execCommand( + client, + `docker logs --tail ${safeTail} ${shQuote(ref)} 2>&1`, + ) + if (code !== 0 && !stdout) { + throw new Error(stderr.trim() || `docker logs exited with code ${code}`) + } + // `2>&1` folds stderr into stdout so interleaved container logs are preserved. + return stdout +} + +export async function containerAction(client: Client, ref: string, action: DockerAction): Promise { + if (!isValidContainerRef(ref)) throw new Error('Invalid container reference') + const { stderr, code } = await execCommand(client, `docker ${action} ${shQuote(ref)}`) + if (code !== 0) { + throw new Error(stderr.trim() || `docker ${action} exited with code ${code}`) + } +} + +export async function removeContainer(client: Client, ref: string, force: boolean): Promise { + if (!isValidContainerRef(ref)) throw new Error('Invalid container reference') + const flag = force ? '--force ' : '' + const { stderr, code } = await execCommand(client, `docker rm ${flag}${shQuote(ref)}`) + if (code !== 0) { + throw new Error(stderr.trim() || `docker rm exited with code ${code}`) + } +} + +/** Builds the remote command for an interactive `docker exec` shell (used over a PTY). */ +export function buildExecShellCommand(ref: string): string { + if (!isValidContainerRef(ref)) throw new Error('Invalid container reference') + // Try bash first, fall back to sh, so it works on minimal images too. The + // ref is validated + single-quoted; the trailing snippet is a fixed string. + const quoted = shQuote(ref) + return `docker exec -it ${quoted} bash 2>/dev/null || docker exec -it ${quoted} sh` +} diff --git a/docs/docker-agent-monitoring.md b/docs/docker-agent-monitoring.md new file mode 100644 index 0000000..5279717 --- /dev/null +++ b/docs/docker-agent-monitoring.md @@ -0,0 +1,215 @@ +# Docker Agent Monitoring (self-hosted, push model) + +Design doc for the self-hosted **Docker push-agent monitoring** feature +(Option 1 in `ROADMAP.md` → "Docker monitoring agent"). Written before +implementation; this is the contract the code should match. + +## Goal + +Let ArchNest **monitor** Docker containers across multiple VMs without ArchNest +reaching into those VMs. A small agent script runs on each Docker host, gathers +rich container data, and **pushes** it to ArchNest. ArchNest stores the latest +report per host and renders it read-only on the Containers page. + +This is monitoring only. **Management (start/stop/restart/exec) is unchanged** +and continues to use the existing Docker-over-SSH path +(`backend/src/ssh/docker.ts`, `backend/src/routes/dockerSsh.ts`) and the Docker +Engine TCP integration (`backend/src/docker/`). A one-way push cannot perform +actions, by design — so nothing about management is removed. + +## Why push (for self-hosted) + +- VMs need **outbound-only** reachability to ArchNest. No exposed port, no + dockerd TCP socket, no inbound SSH required for monitoring. +- Decoupled from SSH auth entirely (sidesteps the cert/OPKSSH auth gap that + affects the Docker-over-SSH path). +- Simplest thing to "drop on any VM": a bash script + cron/systemd timer. + +The richer **pull agent** (on-demand monitor + manage via a local authenticated +HTTP API on each VM) is the **paid** tier — see `ROADMAP.md`, not built here. + +## Architecture + +``` +Docker VM (agent.sh, every N s) ArchNest backend Browser + docker ps --format json ─┐ + docker inspect ... ├─> JSON report ──POST /api/agents/docker/report──> upsert latest + docker stats --no-stream ─┘ (Bearer: ARCHNEST_AGENT_TOKEN) per host_id in SQLite + │ + GET /api/agents/docker/... <────────┘ (user JWT) + │ + Containers page (read-only) +``` + +## Security + +- **Ingest is token-gated, not user-gated.** `POST /api/agents/docker/report` + is authenticated by a single shared secret `ARCHNEST_AGENT_TOKEN` (env var on + the backend, same value in each agent script), compared in **constant time**. + If the env var is unset, the ingest endpoint is **disabled** (returns 503) — + the server never accepts unauthenticated reports. +- **Ingest must be reachable on the mesh / non-public IP only.** The token is + the application-layer guard; network-layer the endpoint should not be exposed + publicly. (A separate, later initiative — the "mesh prerequisite gate" — will + enforce mesh setup app-wide; this doc does not implement that gate. Until it + exists, mesh-only reachability is an operational/deployment responsibility.) +- **Ingest only stores data — it never executes anything from the agent.** The + payload is validated with zod and persisted as-is; there is no command path, + so there is no injection surface from agent input. +- **Read endpoints are behind the normal user `authenticate` hook**, so any + logged-in user can view monitoring data (consistent with the Phase 3 model: + members can view everything). They are read-only. +- Single shared token now; **per-host revocable tokens** are a noted future + improvement, not in this iteration. + +## Report schema (rich) + +The agent posts one report per host. `host_id` is a stable, user-chosen +identifier; `hostname` is informational. + +```jsonc +{ + "hostId": "proxmox-vm-1", // stable id, [A-Za-z0-9._-], required + "hostname": "docker01", // informational + "agentVersion": "1", + "reportedAt": "2026-06-20T19:30:00Z", // agent clock; server also records its own receivedAt + "containers": [ + { + "id": "", + "name": "myapp", + "image": "nginx:1.27", + "imageId": "sha256:...", + "state": "running", // running|exited|paused|created|restarting|dead + "status": "Up 3 hours", // human string from docker ps + "createdAt": "2026-06-20T16:00:00Z", + "startedAt": "2026-06-20T16:00:01Z", + "restartCount": 0, + "restartPolicy": "unless-stopped", + "health": "healthy", // healthy|unhealthy|starting|none + "ports": [ // normalized from inspect + { "hostIp": "0.0.0.0", "hostPort": 8080, "containerPort": 80, "proto": "tcp" } + ], + "networks": [ + { "name": "bridge", "ip": "172.17.0.2" } + ], + "mounts": [ + { "type": "volume", "source": "myapp_data", "destination": "/data", "rw": true } + ], + "env": [ // SECRETS MASKED (see below) + { "key": "NODE_ENV", "value": "production" }, + { "key": "DB_PASSWORD", "value": "********" } + ], + "command": "nginx -g 'daemon off;'", + "labels": { "com.docker.compose.project": "myapp" }, + "stats": { // snapshot from docker stats --no-stream + "cpuPercent": 1.4, + "memUsage": 20971520, + "memLimit": 536870912, + "netRxBytes": 12345, + "netTxBytes": 67890, + "blockReadBytes": 0, + "blockWriteBytes": 0 + } + } + ] +} +``` + +### Env masking +The agent masks values whose key matches a secret-ish pattern +(`/(PASS|SECRET|TOKEN|KEY|PRIVATE|CREDENTIAL)/i`) before sending, replacing the +value with `********`. The full value never leaves the VM. (Defense in depth; +the backend also will not display unmasked secrets.) + +### Source capability note +The Containers page already aggregates three sources (Docker TCP API, Docker +over SSH, and now agent). Not every field exists for every source — the UI must +**degrade gracefully** and show "—" / "not available from this source" rather +than erroring. The agent is the richest source (it runs `docker inspect`). + +## Backend + +### DB +New table, latest-report-per-host (idempotent migration in +`backend/src/db/index.ts`): + +```sql +CREATE TABLE IF NOT EXISTS docker_agent_reports ( + host_id TEXT PRIMARY KEY, + hostname TEXT, + report_json TEXT NOT NULL, -- the full containers array as JSON + reported_at TEXT, -- agent-supplied timestamp + received_at TEXT NOT NULL DEFAULT (datetime('now')) -- server receive time (source of truth for staleness) +); +``` + +We keep only the latest report per `host_id` (upsert). Historical +time-series is out of scope for this iteration. + +### Endpoints +- `POST /api/agents/docker/report` — **token-gated** (Bearer + `ARCHNEST_AGENT_TOKEN`, constant-time). 503 if token unconfigured, 401 on + mismatch, 400 on invalid payload. Upserts the row for `hostId`. +- `GET /api/agents/docker/hosts` — user-auth. Returns each reported host with + `hostId`, `hostname`, `receivedAt`, `containerCount`, and a `stale` flag + (`true` if `received_at` older than `STALE_AFTER_MS`, default ~90s / tunable). +- `GET /api/agents/docker/hosts/:hostId/containers` — user-auth. Returns the + parsed container list for that host (the spreadsheet rows + enough for detail). +- `GET /api/agents/docker/hosts/:hostId/containers/:containerId` — user-auth. + Returns the single container's full detail object. + +`api.ts` gets matching functions + TS interfaces (`AgentHost`, +`AgentContainer`, etc.). + +## Agent script + +`agent/archnest-docker-agent.sh` — portable bash, dependencies: `docker`, +`curl`, and a JSON tool. To avoid forcing `jq`, the script builds the report by +combining `docker ps --format '{{json .}}'`, `docker inspect`, and +`docker stats --no-stream --format '{{json .}}'`; if `jq` is present it is used +to assemble/mask robustly, otherwise a documented `jq`-required note is shown. +(Decision: require `jq` — it is the only sane way to assemble + mask nested +JSON in bash reliably; `jq` is a one-line install on every distro. The script +checks for it and exits with a clear message if missing.) + +Configuration via env (script header or `/etc/archnest/agent.env`): +- `ARCHNEST_URL` — e.g. `http://:4000` (mesh address). +- `ARCHNEST_AGENT_TOKEN` — shared token. +- `ARCHNEST_HOST_ID` — stable id for this VM. + +Scheduling: provide both a **cron** line and a **systemd service + timer** +example. Recommended interval 30s (must be < backend `STALE_AFTER_MS`). + +## Frontend — Containers page + +The Containers page becomes **tabbed**: +- **Tab 1 "Containers"** — the existing spreadsheet view (Name, Image, State, + CPU, Memory, Ports, Actions), now also including agent-reported hosts. The + host selector lists Docker-API, SSH, and agent hosts. +- **Clicking a container Name** opens a **new tab** in the Containers page + showing that container's detail (tabs are dynamic; closeable). + +### Detail tab contents (graceful per-source degradation) +- **Overview:** name, image + tag, image id, short/full id, created, started, + uptime, restart count, restart policy. +- **State & health:** state, exit code (if stopped), healthcheck status. +- **Stats:** CPU %, mem usage/limit, net RX/TX, block I/O (snapshot; agent & + Docker-API have it, SSH list does not). +- **Ports / Networks / Mounts:** tables. +- **Environment & labels:** env vars with secret values masked; labels. +- **Command/entrypoint.** +- **Logs:** recent tail (reuse existing logs path where the source supports it). + +Fields unavailable from the active source render as "—" / a small "not +reported by this source" note. + +## Explicitly deferred (not in this work) + +- **Mesh prerequisite gate** (require mesh detected/tested/verified in Settings + before anything else can be configured) — its own initiative, needs its own + design (lockout-safety is the hard part). This doc assumes mesh-only ingest is + handled operationally for now. +- **Option 2 paid pull-agent** (local authenticated HTTP API per VM, on-demand + monitor + manage) — `ROADMAP.md`. +- **Per-host tokens**, **historical/time-series metrics**, **live log tailing + for agent hosts**. diff --git a/src/lib/api.ts b/src/lib/api.ts index d53a7f9..0c3c374 100644 --- a/src/lib/api.ts +++ b/src/lib/api.ts @@ -159,6 +159,30 @@ export const api = { body: JSON.stringify({ force }), }), + // Docker over SSH: runs the `docker` CLI on a remote SSH host instead of the + // Docker Engine TCP API. `integrationId` here is an SSH integration. + listSshContainers: (integrationId: number) => + apiFetch<{ containers: SshContainer[] }>(`/docker-ssh/${integrationId}/containers`), + sshContainerLogs: (integrationId: number, id: string, tail = 200) => + apiFetch<{ logs: string }>(`/docker-ssh/${integrationId}/containers/${encodeURIComponent(id)}/logs?tail=${tail}`), + sshContainerAction: (integrationId: number, id: string, action: 'start' | 'stop' | 'restart' | 'pause' | 'unpause') => + apiFetch<{ ok: boolean }>(`/docker-ssh/${integrationId}/containers/${encodeURIComponent(id)}/${action}`, { method: 'POST' }), + removeSshContainer: (integrationId: number, id: string, force = false) => + apiFetch<{ ok: boolean }>(`/docker-ssh/${integrationId}/containers/${encodeURIComponent(id)}/remove`, { + method: 'POST', + body: JSON.stringify({ force }), + }), + + // Docker monitoring agents (push model). Read-only; agents POST reports to a + // token-gated ingest endpoint that the UI never calls. + listAgentHosts: () => apiFetch<{ hosts: AgentHost[] }>('/agents/docker/hosts'), + listAgentContainers: (hostId: string) => + apiFetch(`/agents/docker/hosts/${encodeURIComponent(hostId)}/containers`), + getAgentContainer: (hostId: string, containerId: string) => + apiFetch<{ container: AgentContainer }>( + `/agents/docker/hosts/${encodeURIComponent(hostId)}/containers/${encodeURIComponent(containerId)}`, + ), + getHostMetrics: (integrationId: number) => apiFetch(`/integrations/${integrationId}/metrics`), startTransfer: (data: { sourceIntegrationId: number; destIntegrationId: number; sourcePaths: string[]; destPath: string; move?: boolean }) => @@ -296,6 +320,72 @@ export interface Container { ports: { privatePort: number; publicPort?: number; type: string }[] } +export interface SshContainer { + id: string + name: string + image: string + state: string + status: string + /** Raw `docker ps` ports string (e.g. "0.0.0.0:8080->80/tcp"). */ + ports: string +} + +export interface AgentHost { + hostId: string + hostname: string | null + reportedAt: string | null + receivedAt: string + containerCount: number + stale: boolean +} + +export interface AgentContainerPort { + hostIp?: string + hostPort?: number | null + containerPort: number + proto: string +} + +export interface AgentContainerStats { + cpuPercent?: number + memUsage?: number + memLimit?: number + netRxBytes?: number + netTxBytes?: number + blockReadBytes?: number + blockWriteBytes?: number +} + +export interface AgentContainer { + id: string + name: string + image: string + imageId?: string + state: string + status: string + createdAt?: string + startedAt?: string + restartCount?: number + restartPolicy?: string + health?: string + ports: AgentContainerPort[] + networks: { name: string; ip?: string }[] + mounts: { type?: string; source?: string; destination?: string; rw?: boolean }[] + env: { key: string; value: string }[] + command?: string + labels?: Record + stats?: AgentContainerStats +} + +export interface AgentHostContainers { + hostId: string + hostname: string | null + reportedAt: string | null + receivedAt: string + stale: boolean + containers: AgentContainer[] +} + export interface ContainerStats { cpuPercent: number memUsage: number diff --git a/src/pages/Containers.tsx b/src/pages/Containers.tsx index 46c765e..8a6f1e9 100644 --- a/src/pages/Containers.tsx +++ b/src/pages/Containers.tsx @@ -14,7 +14,14 @@ import { ScrollText, X, } from 'lucide-react' -import { api, getToken, type Container, type ContainerStats, type Integration } from '../lib/api' +import { + api, + getToken, + type Container, + type SshContainer, + type AgentContainer, + type ContainerStats, +} from '../lib/api' const TEXT_PRIMARY = '#E8E6E0' const TEXT_SECONDARY = '#7A7D85' @@ -27,6 +34,66 @@ const cardBase: React.CSSProperties = { boxShadow: '0 0 20px rgba(200, 164, 52, 0.03)', } +// docker = Engine TCP API; ssh = `docker` CLI over SSH; agent = pushed report. +// docker/ssh support management; agent is read-only monitoring. +type Source = 'docker' | 'ssh' | 'agent' + +/** A selectable container host. For docker/ssh it wraps an integration id; for + * agent it wraps the string hostId of a reporting agent. */ +interface HostOption { + source: Source + /** integration id (docker/ssh) or agent hostId (agent), as a string key. */ + key: string + label: string + /** numeric integration id for docker/ssh sources. */ + integrationId?: number + /** agent hostId for agent sources. */ + agentHostId?: string +} + +/** Unified table row across all three sources. */ +interface Row { + id: string + name: string + image: string + state: string + status: string + ports: string + /** Stats embedded in agent reports (docker/ssh fetch stats separately/none). */ + embeddedStats?: ContainerStats +} + +function toRowFromDocker(c: Container): Row { + return { + id: c.id, + name: c.name, + image: c.image, + state: c.state, + status: c.status, + ports: c.ports.length === 0 ? '' : c.ports.map((p) => `${p.publicPort ?? ''}${p.publicPort ? ':' : ''}${p.privatePort}/${p.type}`).join(', '), + } +} + +function toRowFromSsh(c: SshContainer): Row { + return { id: c.id, name: c.name, image: c.image, state: c.state.toLowerCase(), status: c.status, ports: c.ports } +} + +function toRowFromAgent(c: AgentContainer): Row { + const ports = c.ports + .map((p) => `${p.hostPort ? `${p.hostPort}:` : ''}${p.containerPort}/${p.proto}`) + .join(', ') + const embeddedStats: ContainerStats | undefined = c.stats + ? { + cpuPercent: c.stats.cpuPercent ?? 0, + memUsage: c.stats.memUsage ?? 0, + memLimit: c.stats.memLimit ?? 0, + netRx: c.stats.netRxBytes ?? 0, + netTx: c.stats.netTxBytes ?? 0, + } + : undefined + return { id: c.id, name: c.name, image: c.image, state: c.state.toLowerCase(), status: c.status, ports, embeddedStats } +} + function stateColor(state: string): string { if (state === 'running') return '#2ECC71' if (state === 'paused') return '#E0A82E' @@ -46,56 +113,113 @@ function formatBytes(bytes: number): string { return `${v.toFixed(1)} ${units[i]}` } +/** A dynamic detail tab opened by clicking a container name. */ +interface DetailTab { + tabId: string + source: Source + integrationId?: number + agentHostId?: string + containerId: string + containerName: string +} + export default function Containers() { - const [hosts, setHosts] = useState([]) - const [integrationId, setIntegrationId] = useState('') - const [containers, setContainers] = useState([]) + const [hostOptions, setHostOptions] = useState([]) + const [selectedKey, setSelectedKey] = useState('') + const [rows, setRows] = useState([]) const [error, setError] = useState(null) const [loading, setLoading] = useState(false) const [busyId, setBusyId] = useState(null) const [statsById, setStatsById] = useState>({}) - const [logsContainer, setLogsContainer] = useState(null) - const [execContainer, setExecContainer] = useState(null) + const [logsRow, setLogsRow] = useState(null) + const [execRow, setExecRow] = useState(null) + + // Intra-page tabs: the containers list plus any opened container-detail tabs. + const [detailTabs, setDetailTabs] = useState([]) + const [activeTab, setActiveTab] = useState('list') + + const selected = hostOptions.find((h) => h.key === selectedKey) + const source: Source | null = selected?.source ?? null + const canManage = source === 'docker' || source === 'ssh' + + async function loadHosts() { + const [{ integrations }, agentRes] = await Promise.all([ + api.listIntegrations(), + api.listAgentHosts().catch(() => ({ hosts: [] })), + ]) + const opts: HostOption[] = [] + for (const i of integrations) { + if (i.type === 'docker') opts.push({ source: 'docker', key: `docker:${i.id}`, label: `${i.name} (Docker API)`, integrationId: i.id }) + if (i.type === 'ssh') opts.push({ source: 'ssh', key: `ssh:${i.id}`, label: `${i.name} (SSH)`, integrationId: i.id }) + } + for (const h of agentRes.hosts) { + const label = `${h.hostname || h.hostId} (Agent${h.stale ? ' — stale' : ''})` + opts.push({ source: 'agent', key: `agent:${h.hostId}`, label, agentHostId: h.hostId }) + } + setHostOptions(opts) + if (opts.length > 0 && !opts.some((o) => o.key === selectedKey)) setSelectedKey(opts[0].key) + } useEffect(() => { - api.listIntegrations().then(({ integrations }) => { - const dockerHosts = integrations.filter((i) => i.type === 'docker') - setHosts(dockerHosts) - if (dockerHosts.length > 0) setIntegrationId(dockerHosts[0].id) - }) + loadHosts() + // eslint-disable-next-line react-hooks/exhaustive-deps }, []) function refresh() { - if (!integrationId) return + if (!selected) return setLoading(true) setError(null) - api - .listContainers(integrationId) - .then(({ containers }) => { - setContainers(containers) - containers.forEach((c) => { - if (c.state !== 'running') return - api - .containerStats(integrationId, c.id) - .then((stats) => setStatsById((prev) => ({ ...prev, [c.id]: stats }))) - .catch(() => {}) + setStatsById({}) + + if (selected.source === 'agent' && selected.agentHostId) { + api + .listAgentContainers(selected.agentHostId) + .then(({ containers }) => setRows(containers.map(toRowFromAgent))) + .catch((err) => setError(err instanceof Error ? err.message : 'Failed to load agent report')) + .finally(() => setLoading(false)) + return + } + + if (selected.source === 'ssh' && selected.integrationId) { + api + .listSshContainers(selected.integrationId) + .then(({ containers }) => setRows(containers.map(toRowFromSsh))) + .catch((err) => setError(err instanceof Error ? err.message : 'Failed to list containers')) + .finally(() => setLoading(false)) + return + } + + if (selected.source === 'docker' && selected.integrationId) { + const integrationId = selected.integrationId + api + .listContainers(integrationId) + .then(({ containers }) => { + setRows(containers.map(toRowFromDocker)) + containers.forEach((c) => { + if (c.state !== 'running') return + api + .containerStats(integrationId, c.id) + .then((stats) => setStatsById((prev) => ({ ...prev, [c.id]: stats }))) + .catch(() => {}) + }) }) - }) - .catch((err) => setError(err instanceof Error ? err.message : 'Failed to list containers')) - .finally(() => setLoading(false)) + .catch((err) => setError(err instanceof Error ? err.message : 'Failed to list containers')) + .finally(() => setLoading(false)) + } } useEffect(() => { refresh() // eslint-disable-next-line react-hooks/exhaustive-deps - }, [integrationId]) + }, [selectedKey]) - async function runAction(c: Container, action: 'start' | 'stop' | 'restart' | 'pause' | 'unpause') { - if (!integrationId) return + async function runAction(c: Row, action: 'start' | 'stop' | 'restart' | 'pause' | 'unpause') { + if (!selected?.integrationId) return setBusyId(c.id) setError(null) try { - await api.containerAction(integrationId, c.id, action) + if (selected.source === 'ssh') await api.sshContainerAction(selected.integrationId, c.id, action) + else await api.containerAction(selected.integrationId, c.id, action) refresh() } catch (err) { setError(err instanceof Error ? err.message : `Failed to ${action} container`) @@ -104,13 +228,14 @@ export default function Containers() { } } - async function removeContainer(c: Container) { - if (!integrationId) return + async function removeRow(c: Row) { + if (!selected?.integrationId) return if (!confirm(`Remove container "${c.name}"? This cannot be undone.`)) return setBusyId(c.id) setError(null) try { - await api.removeContainer(integrationId, c.id, c.state === 'running') + if (selected.source === 'ssh') await api.removeSshContainer(selected.integrationId, c.id, c.state === 'running') + else await api.removeContainer(selected.integrationId, c.id, c.state === 'running') refresh() } catch (err) { setError(err instanceof Error ? err.message : 'Failed to remove container') @@ -119,6 +244,28 @@ export default function Containers() { } } + function openDetail(c: Row) { + if (!selected) return + const tabId = `${selected.key}::${c.id}` + setDetailTabs((prev) => (prev.some((t) => t.tabId === tabId) ? prev : [ + ...prev, + { + tabId, + source: selected.source, + integrationId: selected.integrationId, + agentHostId: selected.agentHostId, + containerId: c.id, + containerName: c.name, + }, + ])) + setActiveTab(tabId) + } + + function closeDetail(tabId: string) { + setDetailTabs((prev) => prev.filter((t) => t.tabId !== tabId)) + setActiveTab((cur) => (cur === tabId ? 'list' : cur)) + } + return (
@@ -127,34 +274,51 @@ export default function Containers() { Containers

- Manage Docker containers across your configured hosts. + Manage and monitor Docker containers — via the Docker Engine API, the + docker CLI over SSH, or a reporting agent.

-
- - -
+ {activeTab === 'list' && ( +
+ + +
+ )}
- {error && ( + {/* Intra-page tab bar */} +
+ setActiveTab('list')} /> + {detailTabs.map((t) => ( + setActiveTab(t.tabId)} + onClose={() => closeDetail(t.tabId)} + /> + ))} +
+ + {error && activeTab === 'list' && (
{error}
)} -
- - - - - - - - - - - - - - {containers.length === 0 && ( - - + {activeTab === 'list' ? ( +
+
NameImageStateCPUMemoryPortsActions
- {integrationId ? 'No containers found.' : 'Select a Docker integration to view containers.'} -
+ + + + + + + + + - )} - {containers.map((c) => { - const stats = statsById[c.id] - const busy = busyId === c.id - return ( - - - - - - - - + + {rows.length === 0 && ( + + - ) - })} - -
NameImageStateCPUMemoryPortsActions
- {c.name} - - {c.image} - - - - {c.status} - - - {stats ? `${stats.cpuPercent.toFixed(1)}%` : '—'} - - {stats ? `${formatBytes(stats.memUsage)} / ${formatBytes(stats.memLimit)}` : '—'} - - {c.ports.length === 0 ? '—' : c.ports.map((p) => `${p.publicPort ?? ''}${p.publicPort ? ':' : ''}${p.privatePort}/${p.type}`).join(', ')} - -
- {c.state === 'running' ? ( - <> - - - - - - ) : c.state === 'paused' ? ( - - ) : ( - - )} - - -
+
+ {selected ? 'No containers found.' : 'Select a container host.'}
-
- - {logsContainer && integrationId && ( - setLogsContainer(null)} /> + )} + {rows.map((c) => { + const stats = statsById[c.id] ?? c.embeddedStats + const busy = busyId === c.id + return ( + + + + + + {c.image} + + + + + {c.status} + + + + {stats ? `${stats.cpuPercent.toFixed(1)}%` : '—'} + + + {stats ? `${formatBytes(stats.memUsage)} / ${formatBytes(stats.memLimit)}` : '—'} + + + {c.ports || '—'} + + +
+ {canManage ? ( + <> + {c.state === 'running' ? ( + <> + + + + + + ) : c.state === 'paused' ? ( + + ) : ( + + )} + + + + ) : ( + read-only + )} +
+ + + ) + })} + + + + ) : ( + (() => { + const tab = detailTabs.find((t) => t.tabId === activeTab) + if (!tab) return null + return + })() )} - {execContainer && integrationId && ( - setExecContainer(null)} /> + + {logsRow && selected?.integrationId && (source === 'docker' || source === 'ssh') && ( + setLogsRow(null)} /> + )} + {execRow && selected?.integrationId && (source === 'docker' || source === 'ssh') && ( + setExecRow(null)} /> )} ) } -function LogsModal({ integrationId, container, onClose }: { integrationId: number; container: Container; onClose: () => void }) { +function TabButton({ label, active, onClick, onClose }: { label: string; active: boolean; onClick: () => void; onClose?: () => void }) { + return ( +
+ {label} + {onClose && ( + { + e.stopPropagation() + onClose() + }} + /> + )} +
+ ) +} + +function DetailRow({ label, value }: { label: string; value: React.ReactNode }) { + return ( +
+ {label} + {value} +
+ ) +} + +function Section({ title, children }: { title: string; children: React.ReactNode }) { + return ( +
+

{title}

+ {children} +
+ ) +} + +/** + * Container detail tab. Agent reports carry the full inspect+stats payload, so + * for agent hosts we render everything. For docker/ssh sources we currently + * only have the list row data, so we show what we have and note the rest is + * available from an agent — graceful degradation per the design. + */ +function ContainerDetail({ tab }: { tab: DetailTab }) { + const [container, setContainer] = useState(null) + const [loading, setLoading] = useState(tab.source === 'agent') + const [error, setError] = useState(null) + + useEffect(() => { + if (tab.source !== 'agent' || !tab.agentHostId) return + setLoading(true) + api + .getAgentContainer(tab.agentHostId, tab.containerId) + .then(({ container }) => setContainer(container)) + .catch((err) => setError(err instanceof Error ? err.message : 'Failed to load container detail')) + .finally(() => setLoading(false)) + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [tab.tabId]) + + if (tab.source !== 'agent') { + return ( +
+

{tab.containerName}

+

+ Rich container detail (inspect data, mounts, networks, environment) is + provided by the monitoring agent. This host is a{' '} + {tab.source === 'ssh' ? 'Docker-over-SSH' : 'Docker API'} source — use the + list view actions for management, or install the ArchNest agent on this + host for full detail. +

+
+ ) + } + + if (loading) return

Loading…

+ if (error) return

{error}

+ if (!container) return

No data.

+ + const c = container + const masked = (v: string) => v + + return ( +
+
+ + + {c.imageId && {c.imageId.slice(0, 19)}} />} + {c.id.slice(0, 12)}} /> + {c.command && {c.command}} />} + {c.createdAt && } + {c.startedAt && } +
+ +
+ + + {c.status || c.state} + + } + /> + {c.health && c.health !== 'none' && } + + {c.restartPolicy && } +
+ + {c.stats && ( +
+ + + + +
+ )} + +
+ {c.ports.length === 0 ? ( +

None published.

+ ) : ( + c.ports.map((p, i) => ( + + )) + )} +
+ +
+ {c.networks.length === 0 ? ( +

None.

+ ) : ( + c.networks.map((n, i) => ) + )} +
+ +
+ {c.mounts.length === 0 ? ( +

None.

+ ) : ( + c.mounts.map((m, i) => ( + + )) + )} +
+ +
+ {c.env.length === 0 ? ( +

None.

+ ) : ( + c.env.map((e, i) => {masked(e.value)}} />) + )} +
+ + {c.labels && Object.keys(c.labels).length > 0 && ( +
+ {Object.entries(c.labels).map(([k, v]) => ( + {v}} /> + ))} +
+ )} +
+ ) +} + +function LogsModal({ source, integrationId, row, onClose }: { source: 'docker' | 'ssh'; integrationId: number; row: Row; onClose: () => void }) { const [logs, setLogs] = useState('') const [loading, setLoading] = useState(true) const [error, setError] = useState(null) @@ -269,8 +624,8 @@ function LogsModal({ integrationId, container, onClose }: { integrationId: numbe function load() { setLoading(true) setError(null) - api - .containerLogs(integrationId, container.id) + const req = source === 'ssh' ? api.sshContainerLogs(integrationId, row.id) : api.containerLogs(integrationId, row.id) + req .then(({ logs }) => setLogs(logs)) .catch((err) => setError(err instanceof Error ? err.message : 'Failed to fetch logs')) .finally(() => setLoading(false)) @@ -283,7 +638,7 @@ function LogsModal({ integrationId, container, onClose }: { integrationId: numbe

- Logs — {container.name} + Logs — {row.name}