PodWarden

Architecture

System components, data flow, and design decisions

System Overview

PodWarden consists of four main components running as Docker containers:

The backend runs with network_mode: host so it can reach Tailscale hosts for SSH provisioning and kubectl operations.

Frontend

The Next.js 15 frontend provides:

  • Dashboard with fleet overview
  • Host management (discover, provision, probe, wipe, register cluster)
  • Cluster CRUD with kubeconfig management and live introspection
  • stack editor with full environment, config file slots, storage, and network configuration
  • deployment management with deploy/undeploy, config file editing, and pod logs
  • Deployment tracking and rollback
  • Storage connections with NFS and S3 testing
  • Backup policy management with snapshot browsing, one-click backup, and restore
  • Ingress rule management with gateway node configuration, multi-path routing, and health checks (DNS, HTTP, TLS)
  • DDNS configuration for Cloudflare, DuckDNS, Webhook, and Hub-managed subdomains
  • Hub catalog browsing and template import
  • Settings (Tailscale, OIDC, SMTP, registry, Hub, users, secrets, MCP tokens)

Authentication is handled client-side via OIDC or local auth, with JWT tokens passed to the API.

Backend

The FastAPI backend handles:

  • REST API for all CRUD operations (~85 endpoints across 16 routers)
  • JWT authentication middleware with role-based access control
  • MCP token authentication for API automation
  • Background host discovery (every 5 minutes via Tailscale API)
  • Background DDNS update loop (every 5 minutes, updates DNS records when public IP changes)
  • SSH-based provisioning via Ansible playbooks (provision, probe, wipe)
  • Kubernetes operations via kubectl with per-cluster kubeconfig
  • Ingress management — generates Kubernetes Ingress, Service, and Endpoints resources for gateway-based routing
  • Backup orchestration — creates and manages Kubernetes CronJobs and Jobs for Restic-based volume backups
  • Storage connection testing (NFS port checks, S3 bucket verification, speed tests)
  • Hub catalog proxy (browse, import, update checks)
  • Encrypted secrets storage (AES-256)

Database

PostgreSQL 16 stores all fleet data:

TablePurpose
hostsDiscovered/manual machines with hardware info, GPU detection, network types, gateway role
clustersK3s clusters with kubeconfig, control plane host, protected flag
workload_definitionsReusable templates with images, resources, env schema, config schema, storage, network requirements
workload_assignmentsCluster-scoped deployments with placement, replicas, env overrides, config values
deploymentsCI/CD deployment history with rollback tracking
provisioning_jobsAnsible job records with stdout/stderr capture
storage_connectionsNFS and S3 backends with connectivity config, network types, and backup target flag
ingress_rulesDomain-to-backend routing rules with gateway host, backend type, TLS, path, and deploy status
ddns_configsDynamic DNS configurations per provider (Cloudflare, DuckDNS, Webhook, Hub) with update status
backup_policiesBackup schedules with mode (hot/cold), retention rules, storage target, and pre-backup hooks
backup_snapshotsIndividual Restic snapshot records with size, file counts, and restore status
endpointsGeneric service endpoints (managers, health checks)
system_usersLocal user management with roles and password hashes
system_configSingleton JSONB row for SMTP, OIDC, Hub configuration
app_secretsEncrypted key-value pairs (SSH keys, registry creds, S3 creds, Tailscale)
mcp_tokensMachine-readable API tokens with role, expiry, audit trail
mcp_audit_logPer-token API call audit (method, path, status)
settingsGlobal registry URL and image tag defaults

Migrations are applied automatically on startup.

Key Design Decisions

Per-Cluster Kubeconfig

Each cluster stores its own kubeconfig in the database. There is no global kubeconfig. This supports multi-cluster deployments across different environments.

Protected Hosts

Hosts in a K3s cluster or with detected Kubernetes cannot be wiped or re-provisioned. This prevents accidental destruction of active infrastructure. Delete the cluster first to unprotect member hosts.

Tailscale for Discovery

Tailscale provides a single, secure way to discover and reach machines across environments without opening public ports. The background discovery loop keeps the host list current.

Network-Aware Scheduling

Hosts, storage connections, and workloads are tagged with network types (public, mesh, LAN). Cluster network capability is the intersection of member host types. Pre-flight checks warn when a workload's requirements don't match the target cluster.

Gateway-Based Ingress

Rather than requiring complex ingress controller configuration, PodWarden generates standard Kubernetes Ingress resources targeting the cluster's built-in Traefik ingress controller (included with K3s). Gateway nodes are designated hosts with public IPs that serve as entry points. This keeps public traffic exposure explicit and auditable.

Backup as Kubernetes Jobs

Backups run as Kubernetes Jobs inside the cluster, not as external processes. This means backup jobs have the same network access and storage mounts as the workloads they protect, with no additional infrastructure required. Restic handles deduplication and encryption.

Hub as Optional Layer

PodWarden Hub is a convenience layer, not a runtime dependency. Instances work fully offline. Hub adds catalog browsing, template import, DDNS subdomain allocation, and update checking — but all stacks are stored locally once imported.

Docker-First

Everything runs in Docker containers. No host-level Node.js, Python, or database installations required. make build && make start is all you need.