# Mesh Network Prerequisite Gate — Design Design doc for requiring a **mesh network (NetBird) to be configured, tested, and verified before the rest of ArchNest can be configured**. Written before implementation. The hard problem here is **not locking the admin out**, so this doc leads with that. > Status: DESIGN — not yet implemented. Decisions marked **[DECIDE]** need the > user's input before coding. ## Goal After account setup, an admin must establish a verified mesh connection before they can configure integrations, bookmarks, tunnels, etc. The intent: ArchNest is meant to operate over a private mesh, and other features (e.g. the Docker agent ingest, SSH to mesh hosts) assume mesh reachability. The gate makes that a first-class, enforced prerequisite rather than an operational assumption. ## The lockout problem (read first) A naive gate that blocks *everything* until mesh is verified is dangerous: if the mesh test fails (wrong token, NetBird down, transient network), the admin could be unable to reach the very settings needed to fix it. The existing codebase already takes lockout seriously (the "last active admin" guards in `auth.ts`). The gate must follow the same principle: **Invariants (non-negotiable):** 1. The gate **never blocks** `/api/auth/*` (login, logout, sessions, password). 2. The gate **never blocks** the mesh configuration + test endpoints, nor the integration create/update/test routes needed to configure the mesh. 3. Enforcement is primarily **UI-level** (a gate screen that *is itself* the mesh-config UI), so the admin always has a way forward — the gate screen lets them enter/edit/test the mesh right there. 4. There is an explicit, logged **admin override** ("skip / I'll set this up later") — see **[DECIDE A]**. Without an override, a hard outage of the mesh provider could brick configuration access. 5. The mesh config row is always editable even when the gate is unsatisfied. ## What counts as "verified"? [DECIDE B] Options, from loosest to strictest: - **(B1) Reachable:** a NetBird integration exists and `testConnection` succeeds (the NetBird API answers `/api/peers` with the token). Proves the control-plane token works, not that *this host* is on the mesh. - **(B2) On the mesh:** the backend host itself has an interface/IP in the mesh range (e.g. `100.64.0.0/10`), checked server-side. Proves the ArchNest host is actually meshed. - **(B3) Reachable + peers present:** B1 plus `listResources()` returning ≥1 connected peer. Recommendation: **B1 as the baseline verification** (it's what the existing NetBird adapter already supports and is deterministic), with **B2 as an additional optional check** surfaced as info ("this host's mesh IP: …"). B3 is nice but a single-peer network is legitimate, so don't require peers. This needs your call — see [DECIDE B] at the end. ## Where state lives There is **no server-side key-value config store** today; all config is in the `integrations` table. Two options: - **(S1) Derive from the NetBird integration:** "mesh verified" = there exists a `netbird` integration with `status = 'connected'` (optionally within a freshness window). No new table. Simplest, but conflates "an integration that happens to be NetBird" with "the designated mesh". - **(S2) New `system_config` key-value table:** explicit keys like `mesh.integrationId`, `mesh.verifiedAt`, `mesh.overrideUntil`. Cleaner, gives us a real home for future system-level settings (and the override flag), at the cost of a new table + endpoints. Recommendation: **S2 — a small `system_config` kv table.** The gate needs to persist an override flag and a "designated mesh integration" pointer that S1 can't cleanly represent, and ArchNest will want a system-config store for other things eventually (this is also where a future "mesh required: on/off" toggle lives). Proposed schema: ```sql CREATE TABLE IF NOT EXISTS system_config ( key TEXT PRIMARY KEY, value TEXT NOT NULL, updated_at TEXT NOT NULL DEFAULT (datetime('now')) ); ``` Keys: `mesh.integrationId` (the designated NetBird integration), `mesh.verifiedAt` (ISO timestamp of last successful verify), `mesh.overrideUntil` (optional ISO timestamp for a temporary admin skip), `mesh.required` (`"true"`/`"false"`, default true — lets the whole gate be turned off). ## Frontend flow ### New auth status Add `'needs-mesh'` to the `AuthStatus` union in `AuthContext.tsx`. `refresh()` currently: token → `api.me()` → `'logged-in'`. New: after a successful `api.me()`, also call `api.getMeshStatus()`; if mesh is **required and not verified and not overridden**, set `'needs-mesh'` instead of `'logged-in'`. ### App routing (`App.tsx`) Insert a branch **after** `logged-out`/`enrolling` and **before** `Dashboard`: ``` if (status === 'needs-mesh') return return ``` ### `MeshGate` page A focused, full-screen page (styled like Enrollment) that: - Explains the prerequisite. - Lets the admin **configure the NetBird mesh** (reuse the integration create/test form — same `createIntegration` + `testIntegration` calls Enrollment's `ConnectForm` already uses), or pick an existing NetBird integration as the designated mesh. - Runs **Detect → Test → Verify**: shows the test result, and (B2) the detected mesh IP of the ArchNest host. - On success, marks `mesh.verifiedAt`, then calls `refresh()` → advances to `Dashboard`. - Provides the **admin override** control per [DECIDE A]. - **Members (non-admins):** a member who logs in while mesh is unverified can't fix it (only admins configure integrations). They should see a "waiting on an admin to finish mesh setup" message, not a config form. [DECIDE C: do we even allow member login pre-verification, or block all use until verified?] ### Enrollment Keep Enrollment's account step. The mesh step can either be folded into Enrollment as a mandatory step before `finishEnrollment()`, or live purely as the post-login `needs-mesh` gate. Recommendation: **gate only** (don't duplicate in Enrollment) — one code path, and it also covers existing installs that predate the gate. ## Backend - `GET /api/system/mesh-status` (mirrors `setup-status`): returns `{ required, verified, overridden, meshIntegrationId, hostMeshIp? }`. Behind `authenticate` (any logged-in user can read). - `POST /api/system/mesh/verify` (admin): designates a NetBird integration as the mesh, runs its `testConnection`, (B2) checks host mesh IP, persists `mesh.integrationId` + `mesh.verifiedAt` on success. Returns the result. - `POST /api/system/mesh/override` (admin) **[DECIDE A]**: sets `mesh.overrideUntil` (or a permanent skip). Writes a `logEvent`. - Optional `PUT /api/system/mesh/required` (admin): toggle `mesh.required`. - **Lockout safety:** none of the gate enforcement lives in a global request hook that could block auth/integration/system routes. If we add any server-side enforcement at all (beyond the UI gate), it must explicitly exempt `/api/auth/*`, `/api/integrations*`, and `/api/system/*`. ## Decisions needed before coding - **[DECIDE A] Override:** Should the admin have a "skip for now" escape hatch? Strongly recommend **yes** (lockout safety). If yes: temporary (e.g. 24h, re-prompts) or permanent-until-changed? And does skipping still let them into the Dashboard fully, or into a limited state? - **[DECIDE B] Verified definition:** B1 (reachable), B2 (host on mesh), or B1+B2? Recommend B1 baseline + B2 as informational. - **[DECIDE C] Member behavior pre-verification:** block all non-admin login until mesh verified, or let members in with a "setup in progress" notice? - **[DECIDE D] Existing install / this very deployment:** the live instance has no mesh row yet. Turning the gate on **will immediately gate the running production app** at next login. Do we (i) default `mesh.required = false` and let the admin opt in, or (ii) default it on but rely on the override? This is the riskiest part for the deployed instance. ## Explicitly out of scope - Auto-installing/joining NetBird from ArchNest (we only verify, not provision). - Supporting non-NetBird meshes (Tailscale, etc.) — possible later via the same `mesh.integrationId` indirection.