dev_arc_aws/docs/rdp-debug-handoff.md

17 KiB

RDP Connection Debugging — Handoff Doc

RESOLVED (2026-06-22) — root cause found, proven end-to-end

Root cause: guacd 1.5.5 ships FreeRDP 2.11.5, whose NLA/CredSSP client cannot complete authentication against gnome-remote-desktop, which mandates NLA.

Proven at every layer (not a theory — the EGL/Mesa/Zink hypothesis below was a red herring):

  1. Server mandates NLA. Direct xfreerdp (v3) from the Fedora VM to its own gnome-remote-desktop returns, for /sec:tls and /sec:rdp: [WARN][com.freerdp.core.nego] Error: HYBRID_REQUIRED_BY_SERVER [0x00000005]Protocol Security Negotiation Failure. grdctl rdp set-auth-methods only offers credentials (NLA) and kerberosthere is no non-NLA / plain-RDP mode to turn off.
  2. guacd's FreeRDP 2 can't do NLA against it. Driving the real guacd path (guacd 172.18.0.2:4822 → VM) with security = nla, tls, rdp, AND any all return the identical Guacamole error Server refused connection (wrong security type?) (code 519). guacd's own log confirms it tried correctly: Security mode: NLA … then RDP server closed/refused connection: Server refused connection (wrong security type?). The fact that all four modes fail identically was the tell — it's not a mode mismatch, it's that FreeRDP 2's CredSSP handshake is incompatible with gnome-remote-desktop's.
  3. Bumping guacd does NOT fix it. guacamole/guacd:1.6.0 still ships FreeRDP 2.11.7 (verified by inspecting the image). FreeRDP 3.x is what fixes gnome-remote-desktop NLA interop, and Apache's guacd image doesn't ship FreeRDP 3 yet. So an image bump is wasted.

Fix / recommendation (general — other ArchNest users will hit this)

gnome-remote-desktop is not a reliable RDP target for guacd-based gateways (this affects Fedora/Ubuntu 22.04+ desktops using GNOME's built-in "Remote Desktop"). The fix applied here, plus the alternative considered:

  • Applied & verified (operational, per-VM): replaced gnome-remote-desktop with xrdp on the test VM. xrdp's RDP-security path interoperates with guacd's FreeRDP 2. Steps run: sudo dnf install -y xrdp && sudo systemctl enable --now xrdp; then disabled + masked gnome-remote-desktop's user service (systemctl --user mask gnome-remote-desktop.service) and killed the lingering daemon that was still holding port 3389 so xrdp could bind it. Verified end-to-end through the real guacd path: with security=any, guacd authenticates and streams live desktop frames. security MUST be any (or blank → defaults to any) for xrdp's default config — nla fails (Security negotiation failed) and rdp errors out. Note: xrdp gives a fresh X login session, not a takeover of the existing Wayland session.
  • Alternative (infra, affects everyone): a custom guacd build with FreeRDP 3. Not worth it yet — it's a 30+ min from-source build to maintain in docker-compose.yml, for one upstream gap that Apache will eventually close. Revisit if/when guacamole/guacd ships FreeRDP 3.

No ArchNest code change was required — the security field + ignore-cert handling in backend/src/routes/guacamole.ts (added earlier this debugging arc) are correct and remain useful for other RDP servers. The blocker was purely the guacd↔gnome NLA incompatibility.

The original investigation notes below are kept for history.


Goal

ArchNest is a self-hosted dashboard product. One of its integrations is a "Remote Desktop" connection type that proxies RDP/VNC/Telnet sessions through guacd (Apache Guacamole's proxy daemon) so users can open a remote desktop session in the browser. This needs to work reliably for any user's RDP server, not just this one — so the immediate goal is to get this specific connection working, but treat every root cause found as a potential general fix (config option, docs, code change) since other users will hit the same servers (gnome-remote-desktop, xrdp, Windows RDP, etc).

You have hands-on access to both machines involved. Use that — actively connect to both, run diagnostics on both sides simultaneously, and correlate logs/timestamps. Do not guess from one side alone; multiple times in this debugging session, a theory formed from only one machine's logs turned out to be wrong once the other machine's logs were checked.

The two machines

  1. racknerd-712b73a — the VPS running the ArchNest stack (this repo) in Docker.

    • Container archnest-backend — the Node/Fastify backend. Route of interest: backend/src/routes/guacamole.ts — bridges a browser WebSocket to guacd using guacamole-lite's ClientConnection/Crypt classes. Builds a Guacamole connection token (protocol, hostname, port, username, password, domain, security, ignore-cert) and hands it to guacd.
    • Container archnest-guacd — Apache Guacamole's guacd (v1.5.5), the proxy daemon that actually speaks RDP/VNC/Telnet to the target. Listens on port 4822. On the archnest_default Docker network, internal IP 172.18.0.2, DNS aliases archnest-guacd/guacd. Backend env vars: ARCHNEST_GUACD_HOST=guacd, ARCHNEST_GUACD_PORT=4822.
    • Diagnostic command: docker logs -f archnest-guacd — shows each connection attempt, the security mode negotiated, certificate validation results, and the final success/refusal message from FreeRDP (the RDP client library guacd uses internally).
    • Also useful: docker exec archnest-backend env | grep ARCHNEST_GUACD, docker inspect archnest-guacd (to confirm network/IP), nc -zv 192.168.122.55 3389 (already confirmed reachable from racknerd).
  2. Fedora VM (192.168.122.55) — appears to be a libvirt VM co-located on the same physical host as racknerd (it's in libvirt's default NAT range, and is reachable from racknerd over a private 192.168.x address despite racknerd otherwise looking like a public VPS). Running Fedora 44, GPU is a Red Hat, Inc. Virtio 1.0 GPU (rev 01) (confirmed via lspci). User sam, password happy2026 (test/lab credentials, not a real secret).

    • RDP is served by gnome-remote-desktop (GNOME's built-in RDP/VNC daemon), running as a per-user systemd service: systemctl --user status gnome-remote-desktop, systemctl --user restart gnome-remote-desktop.
    • Configured via the grdctl CLI: grdctl status --show-credentials, grdctl rdp enable, grdctl rdp set-credentials <user> <pass>, grdctl rdp set-tls-cert/set-tls-key, grdctl rdp disable-view-only.
    • Diagnostic command: journalctl --user -u gnome-remote-desktop -f — shows the daemon's own startup/shutdown/error logs.
    • There is a confirmed active, unlocked, real graphical session: loginctl list-sessions showed session 51 (seat0, tty2, class user), and loginctl show-session 51 -p Type -p State -p Active returned Type=wayland, Active=yes, State=active. So gnome-remote-desktop has a real Wayland session to attach to — this is NOT a "no session" problem.

What's already been fixed (confirmed working, do not re-investigate these)

  1. DNS: an earlier hostname (fedora) didn't resolve from the backend container — resolved by using the IP 192.168.122.55 directly instead.
  2. Self-signed cert rejection: FreeRDP/guacd rejected the target's self-signed RDP cert by default. Fixed in code — backend/src/routes/guacamole.ts now sets settings['ignore-cert'] = 'true' whenever protocol === 'rdp'. Confirmed deployed via docker exec archnest-backend grep -A2 "ignore-cert" /app/dist/routes/guacamole.js.
  3. No way to override RDP security mode: added a security field to the connection token (settings.security = security || 'any') and exposed it in the Settings UI (src/pages/Settings.tsx, field key security, hint text about NLA). User has tried any, nla, tls, and rdp — all fail identically (see below).
  4. GNOME's own RDP TLS cert was corrupt: journalctl showed [ERROR][com.freerdp.crypto] - [x509_utils_from_pem]: BIO_new failed for certificate / RDP server certificate is invalid. Fixed by regenerating the cert/key on the Fedora VM:
    openssl req -new -newkey rsa:4096 -days 365 -nodes -x509 -subj "/CN=fedora" \
      -keyout ~/.local/share/gnome-remote-desktop/rdp-tls.key \
      -out ~/.local/share/gnome-remote-desktop/rdp-tls.crt
    grdctl rdp set-tls-cert ~/.local/share/gnome-remote-desktop/rdp-tls.crt
    grdctl rdp set-tls-key ~/.local/share/gnome-remote-desktop/rdp-tls.key
    systemctl --user restart gnome-remote-desktop
    
    Confirmed fixed — later journal output shows no cert error on startup.
  5. RDP sharing disabled at the gnome-remote-desktop level (grdctl status showed Status: disabled even though the daemon process was running and the port was listening). Fixed via grdctl rdp enable + systemctl --user restart gnome-remote-desktop.
  6. Credentials missing / GNOME Keyring locked: grdctl rdp set-credentials sam happy2026 failed with Cannot create an item in a locked collection because the keyring wasn't unlocked (likely an artifact of an SSH-only login rather than a real unlocked graphical login). Fixed via:
    echo -n 'your-login-password' | gnome-keyring-daemon --unlock
    grdctl rdp set-credentials sam happy2026
    
    grdctl status --show-credentials now consistently shows Unit status: active, RDP: Status: enabled, Username: sam, Password: happy2026.

UPDATE: connection now succeeds, but screen is blank

The "Server refused connection (wrong security type?)" failure described below has since been resolved (cause not fully pinned down before it started working — likely one of the cert/credential/security-mode fixes finally lined up). The ArchNest Guacamole viewer now shows "Connected" in the top-right status, with the session named Fedora-WS — but the viewport is solid black, no desktop content ever renders. Confirmed NOT a lock-screen issue (user confirmed gnome is unlocked). journalctl --user -u gnome-remote-desktop -n 50 checked at the time showed the daemon has been running continuously since 10:34:45 with no crash (unlike earlier attempts where "RDP server started" was immediately followed by "RDP server stopped") — so this is a different failure mode than the original refusal: negotiation and session start now succeed, but no frame data is ever captured/sent.

This is strong evidence for the EGL/Mesa/Zink theory below — the daemon accepts the connection and starts the RDP server but apparently cannot capture real screen content, producing a connected-but-blank session instead of crashing outright. Next diagnostic step (not yet completed): tail journalctl --user -u gnome-remote-desktop -f AND journalctl --user -f | grep -i -E "pipewire|portal|screencast|monitor" live, while the black-screen session is open and the user clicks/moves the mouse in the Guacamole viewport, to catch any PipeWire/portal screencast error that doesn't appear in the regular unit log.

Original unresolved problem (superseded by the above, kept for history)

Connecting through ArchNest (browser → backend → guacd → Fedora VM) used to fail outright with:

Error: Server refused connection (wrong security type?)

This was tried with security set to any, nla, tls, and rdpidentical failure every time, regardless of mode. That's suspicious: if it were a genuine security negotiation mismatch, different modes should fail differently (or some should succeed). The fact that they all failed identically suggested the real failure might be happening after security negotiation succeeds — e.g. at session-start/framebuffer-creation time — and FreeRDP's client-side error message is a generic/misleading bucket for "the connection didn't complete," not literally a security-type mismatch. This symptom is no longer reproducing (see UPDATE above) — leave this section for historical context only.

Open theory (unconfirmed)

journalctl --user -u gnome-remote-desktop shows, on every daemon startup, EGL/Mesa/Zink rendering errors:

libEGL warning: failed to get driver name for fd -1
MESA-LOADER: failed to retrieve device information
MESA: error: ZINK: failed to choose pdev
libEGL warning: egl: failed to create dri2 screen

There was also one observed instance of "RDP server started" immediately followed by "RDP server stopped" with timing consistent with an actual connection attempt. The theory is that gnome-remote-desktop can't create a renderable framebuffer for screen capture (no working GPU/software-render path) and crashes/aborts when a client actually tries to start a session — which a FreeRDP client then reports as "wrong security type" because that's the generic refusal message FreeRDP shows for several different underlying failure modes.

This theory has NOT been confirmed. It's a leading hypothesis based on log timing correlation only — no one has yet proven the EGL/Mesa errors are causal vs. just noise from gnome-remote-desktop probing GPU paths at startup (which may be harmless/expected on a Virtio-GPU VM that falls back to software rendering anyway).

Diagnostic step that was in progress, never completed

A direct xfreerdp test, bypassing guacd entirely, to isolate whether gnome-remote-desktop rejects ANY RDP client (not just guacd/FreeRDP-via-guacd), or whether this is specific to how guacd's embedded FreeRDP negotiates. freerdp/xfreerdp has now been installed on both machines, but the actual test was never run/reported back. This should be your first move:

# From racknerd (mimics guacd's exact network path: container -> VM):
xfreerdp /v:192.168.122.55 /sec:tls /cert-ignore /u:sam /p:happy2026 +auth-only
xfreerdp /v:192.168.122.55 /sec:nla /cert-ignore /u:sam /p:happy2026 +auth-only
xfreerdp /v:192.168.122.55 /sec:rdp /cert-ignore /u:sam /p:happy2026 +auth-only

# From the Fedora VM itself (rules out networking, tests gnome-remote-desktop alone):
xfreerdp /v:localhost /sec:tls /cert-ignore /u:sam /p:happy2026 +auth-only

Run these WHILE simultaneously tailing both:

# on racknerd:
docker logs -f archnest-guacd
# on the Fedora VM:
journalctl --user -u gnome-remote-desktop -f

Correlate the exact moment of failure across both logs. This is the single most valuable piece of evidence currently missing.

Instructions

  1. Get hands-on access to both racknerd-712b73a and the Fedora VM (192.168.122.55).
  2. Run the xfreerdp direct tests above, with both logs tailing simultaneously, and read the actual FreeRDP client-side error output (not just "wrong security type" — xfreerdp's raw stderr/exit code will usually have more detail than what bubbles up through guacd/Guacamole's client to the ArchNest UI).
  3. If xfreerdp succeeds where ArchNest's guac connection fails, the bug is in how backend/src/routes/guacamole.ts builds the connection settings/token, or in the guacamole-lite/guacd version compatibility — debug from there, comparing exactly what settings xfreerdp used successfully vs. what ArchNest sends.
  4. If xfreerdp also fails identically, the problem is squarely on the gnome-remote-desktop / Fedora VM side. Investigate the EGL/Mesa/Zink rendering theory directly — check whether software rendering (llvmpipe) is available (glxinfo -B from an actual Wayland session, not an SSH shell — note: an earlier attempt from an SSH shell failed with Error: unable to open display, which is expected and not informative; you need to run it from within session 51 or equivalent), and whether the VM's libvirt XML has virtio-gpu with working 3D/virgl acceleration configured on the hypervisor side.
  5. If gnome-remote-desktop turns out to be fundamentally unable to serve a real client (vs. screen-sharing GNOME's own "Remote Login" feature, which is its primary intended use case), consider recommending xrdp as a replacement RDP server on the Fedora VM, and note this in your report as a general product recommendation (since other ArchNest users may hit the same gnome-remote-desktop limitation).
  6. Keep ArchNest's product goal in mind throughout: any fix that's specific to this user's VM is fine for unblocking them, but if you find a root cause that's likely to recur for other users (e.g. a guacd config default, a missing Settings field, a code bug in backend/src/routes/guacamole.ts), make the corresponding code/config fix in this repo, not just a one-off operational fix on this VM.

What to report back when done

Write a concise report (for the engineer/AI who handed this off) covering:

  • The root cause, with the specific log lines/evidence that proved it (not just a theory).
  • The exact fix applied, including any commands run on either machine and any code changes made in this repo (with file paths and diffs).
  • Whether the fix is specific to this VM or represents a general product issue that other ArchNest users could hit — and if general, what was changed in the codebase to address it.
  • Current working/non-working status of the connection after the fix, with the actual test performed to confirm it works end-to-end through ArchNest's UI (not just via direct xfreerdp).