Architecture
System components, data flow, and design decisions
System Overview
PodWarden consists of four main components running as Docker containers:
The backend runs with network_mode: host so it can reach Tailscale hosts for SSH provisioning and kubectl operations.
Frontend
The Next.js 15 frontend provides:
- Dashboard with fleet overview
- Host management (discover, provision, probe, wipe, register cluster)
- Cluster CRUD with kubeconfig management and live introspection
- stack editor with full environment, config file slots, storage, and network configuration
- deployment management with deploy/undeploy, config file editing, and pod logs
- Deployment tracking and rollback
- Storage connections with NFS and S3 testing
- Backup policy management with snapshot browsing, one-click backup, and restore
- Ingress rule management with gateway node configuration, multi-path routing, and health checks (DNS, HTTP, TLS)
- DDNS configuration for Cloudflare, DuckDNS, Webhook, and Hub-managed subdomains
- Hub catalog browsing and template import
- Settings (Tailscale, OIDC, SMTP, registry, Hub, users, secrets, MCP tokens)
Authentication is handled client-side via OIDC or local auth, with JWT tokens passed to the API.
Backend
The FastAPI backend handles:
- REST API for all CRUD operations (~85 endpoints across 16 routers)
- JWT authentication middleware with role-based access control
- MCP token authentication for API automation
- Background host discovery (every 5 minutes via Tailscale API)
- Background DDNS update loop (every 5 minutes, updates DNS records when public IP changes)
- SSH-based provisioning via Ansible playbooks (provision, probe, wipe)
- Kubernetes operations via kubectl with per-cluster kubeconfig
- Ingress management — generates Kubernetes Ingress, Service, and Endpoints resources for gateway-based routing
- Backup orchestration — creates and manages Kubernetes CronJobs and Jobs for Restic-based volume backups
- Storage connection testing (NFS port checks, S3 bucket verification, speed tests)
- Hub catalog proxy (browse, import, update checks)
- Encrypted secrets storage (AES-256)
Database
PostgreSQL 16 stores all fleet data:
| Table | Purpose |
|---|---|
hosts | Discovered/manual machines with hardware info, GPU detection, network types, gateway role |
clusters | K3s clusters with kubeconfig, control plane host, protected flag |
workload_definitions | Reusable templates with images, resources, env schema, config schema, storage, network requirements |
workload_assignments | Cluster-scoped deployments with placement, replicas, env overrides, config values |
deployments | CI/CD deployment history with rollback tracking |
provisioning_jobs | Ansible job records with stdout/stderr capture |
storage_connections | NFS and S3 backends with connectivity config, network types, and backup target flag |
ingress_rules | Domain-to-backend routing rules with gateway host, backend type, TLS, path, and deploy status |
ddns_configs | Dynamic DNS configurations per provider (Cloudflare, DuckDNS, Webhook, Hub) with update status |
backup_policies | Backup schedules with mode (hot/cold), retention rules, storage target, and pre-backup hooks |
backup_snapshots | Individual Restic snapshot records with size, file counts, and restore status |
endpoints | Generic service endpoints (managers, health checks) |
system_users | Local user management with roles and password hashes |
system_config | Singleton JSONB row for SMTP, OIDC, Hub configuration |
app_secrets | Encrypted key-value pairs (SSH keys, registry creds, S3 creds, Tailscale) |
mcp_tokens | Machine-readable API tokens with role, expiry, audit trail |
mcp_audit_log | Per-token API call audit (method, path, status) |
settings | Global registry URL and image tag defaults |
Migrations are applied automatically on startup.
Key Design Decisions
Per-Cluster Kubeconfig
Each cluster stores its own kubeconfig in the database. There is no global kubeconfig. This supports multi-cluster deployments across different environments.
Protected Hosts
Hosts in a K3s cluster or with detected Kubernetes cannot be wiped or re-provisioned. This prevents accidental destruction of active infrastructure. Delete the cluster first to unprotect member hosts.
Tailscale for Discovery
Tailscale provides a single, secure way to discover and reach machines across environments without opening public ports. The background discovery loop keeps the host list current.
Network-Aware Scheduling
Hosts, storage connections, and workloads are tagged with network types (public, mesh, LAN). Cluster network capability is the intersection of member host types. Pre-flight checks warn when a workload's requirements don't match the target cluster.
Gateway-Based Ingress
Rather than requiring complex ingress controller configuration, PodWarden generates standard Kubernetes Ingress resources targeting the cluster's built-in Traefik ingress controller (included with K3s). Gateway nodes are designated hosts with public IPs that serve as entry points. This keeps public traffic exposure explicit and auditable.
Backup as Kubernetes Jobs
Backups run as Kubernetes Jobs inside the cluster, not as external processes. This means backup jobs have the same network access and storage mounts as the workloads they protect, with no additional infrastructure required. Restic handles deduplication and encryption.
Hub as Optional Layer
PodWarden Hub is a convenience layer, not a runtime dependency. Instances work fully offline. Hub adds catalog browsing, template import, DDNS subdomain allocation, and update checking — but all stacks are stored locally once imported.
Docker-First
Everything runs in Docker containers. No host-level Node.js, Python, or database installations required. make build && make start is all you need.