Understanding Resource Requirements
What CPU, RAM, GPU, VRAM, volumes, and network values mean on every template — and how to choose the right ones
Every stack template in PodWarden specifies resource requirements — how much computing power, memory, storage, and network connectivity the application needs. These values appear on template cards in the Hub catalog and in the stack editor.
This guide explains what each value means, how the notation works, and how to translate it into real-world hardware terms.
CPU
CPU is measured in cores or millicores. One core equals 1,000 millicores.
| Value | Meaning | Rough equivalent |
|---|---|---|
100m | 0.1 cores (100 millicores) | A tenth of a single CPU core — enough for lightweight sidecars, exporters, and agents that mostly sit idle |
250m | 0.25 cores | A quarter core — suitable for small helper services, log forwarders, metrics collectors |
500m | 0.5 cores | Half a core — typical for lightweight web servers, caches, single-purpose APIs |
1 | 1 full core | One complete CPU core — enough for most standard applications (databases, web apps, CI runners) |
2 | 2 cores | Two cores — heavier workloads like build servers, media processing, application servers under load |
4 | 4 cores | Four cores — compute-intensive tasks like video transcoding, large database instances |
8 | 8 cores | Eight cores — high-performance workloads, multi-threaded batch processing |
What "CPU" actually means
A CPU request is a guaranteed minimum. When you set cpu_request: 500m, the cluster guarantees your workload will always have access to at least half a core. If the node has spare capacity, the workload can burst above that — but 500m is what's reserved and always available.
This is not the same as clock speed (GHz). A "core" here means one hardware thread on whatever CPU the node has. A workload requesting 1 core on a node with an AMD EPYC 7763 gets one thread of that processor. The same request on a Raspberry Pi gets one ARM core. The amount of work that core can do varies by hardware, but PodWarden ensures the workload gets the core time it asked for.
How to choose
For most applications, start with the template's default and adjust based on observed usage. PodWarden shows actual CPU consumption in workload logs — if a workload consistently uses less than its request, you can safely lower it. If it's hitting the ceiling, increase it.
Memory (RAM)
Memory is measured in bytes using binary (power-of-two) units. The two you'll see most often are Mi (mebibytes) and Gi (gibibytes).
| Value | Meaning | In familiar terms |
|---|---|---|
64Mi | 64 mebibytes (~67 MB) | Minimal — enough for tiny agents and exporters |
128Mi | 128 mebibytes (~134 MB) | Small utilities, metrics collectors, lightweight proxies |
256Mi | 256 mebibytes (~268 MB) | Small web servers, simple APIs |
512Mi | 512 mebibytes (~537 MB) | Moderate applications, light databases |
1Gi | 1 gibibyte (~1.07 GB) | Standard applications, small to mid-size databases |
2Gi | 2 gibibytes (~2.15 GB) | Application servers, medium databases, build tools |
4Gi | 4 gibibytes (~4.29 GB) | Larger databases, Java applications, content management systems |
8Gi | 8 gibibytes (~8.59 GB) | Memory-intensive workloads, search engines (Elasticsearch), ML inference |
16Gi | 16 gibibytes (~17.18 GB) | Large databases, in-memory caches, heavy ML workloads |
32Gi | 32 gibibytes (~34.36 GB) | Large-scale data processing, model training |
Mi vs. Gi vs. MB vs. GB
Kubernetes (and PodWarden) uses binary units (Mi, Gi), not decimal units (MB, GB):
- 1 Mi (mebibyte) = 1,048,576 bytes = 1024 × 1024
- 1 Gi (gibibyte) = 1,073,741,824 bytes = 1024 × 1024 × 1024
- 1 MB (megabyte) = 1,000,000 bytes = 1000 × 1000
- 1 GB (gigabyte) = 1,000,000,000 bytes = 1000 × 1000 × 1000
The difference is small — 1 Gi is about 7% more than 1 GB. In practice, you can think of 1Gi as "roughly 1 GB" and 512Mi as "roughly half a GB."
What a memory request means
Like CPU, the memory request is a guaranteed minimum. The cluster ensures your workload always has that much RAM available. If the workload tries to use significantly more than its request and the node is under memory pressure, the cluster may kill (OOMKill) and restart the container.
For memory-sensitive workloads (databases, caches, ML models), set the request close to the workload's actual peak usage. Undersizing memory causes restarts; oversizing wastes cluster capacity.
GPU
The gpu_count field specifies how many NVIDIA GPUs the workload needs. Most GPU workloads request 1; multi-GPU setups (distributed training, large model inference) may request 2, 4, or 8.
| Value | Meaning |
|---|---|
0 | No GPU needed — CPU-only workload |
1 | One GPU — standard for inference, single-GPU training, transcoding |
2 | Two GPUs — larger models, faster training with data parallelism |
4+ | Four or more — distributed training, very large models |
How GPU scheduling works
When a workload requests a GPU, the cluster places it on a node that has an available GPU. GPUs are exclusive — one workload gets the entire GPU; they aren't shared or time-sliced (unless you've configured GPU sharing/MIG at the cluster level).
If no node has enough free GPUs, the workload stays pending until one becomes available.
VRAM (GPU Memory)
The vram_request field specifies how much GPU video memory the workload needs. This is separate from system RAM — VRAM is the memory on the GPU card itself.
| Value | Meaning | Typical use |
|---|---|---|
4Gi | 4 GB VRAM | Small inference models, basic GPU-accelerated tasks |
8Gi | 8 GB VRAM | Mid-size models (7B parameter LLMs), hardware transcoding |
16Gi | 16 GB VRAM | Larger models (13B parameter LLMs), Stable Diffusion XL |
24Gi | 24 GB VRAM | Large models (30B parameters), training smaller models |
48Gi | 48 GB VRAM | Very large models (70B parameters), multi-model serving |
80Gi | 80 GB VRAM | Cutting-edge models, distributed training shards |
VRAM uses the same binary units as system memory (Gi = gibibytes). PodWarden uses the VRAM request to match workloads to nodes with GPUs that have enough video memory. A workload requesting 24Gi VRAM won't be scheduled on a node with an RTX 3060 (12 GB) — it needs at least an RTX 3090, RTX 4090, A5000, or similar.
Common GPU cards and their VRAM
| GPU | VRAM | Typical VRAM request match |
|---|---|---|
| RTX 3060 | 12 GB | up to 12Gi |
| RTX 3090 / 4090 | 24 GB | up to 24Gi |
| A4000 | 16 GB | up to 16Gi |
| A5000 | 24 GB | up to 24Gi |
| A6000 | 48 GB | up to 48Gi |
| A100 (40 GB) | 40 GB | up to 40Gi |
| A100 (80 GB) | 80 GB | up to 80Gi |
| H100 (80 GB) | 80 GB | up to 80Gi |
| Jetson Orin NX 8 GB | 8 GB (shared) | up to 8Gi |
| Jetson Orin NX 16 GB | 16 GB (shared) | up to 16Gi |
Volumes
Volumes define where an application stores persistent data — files that survive container restarts. Each volume mount has a name and a mount path (the folder inside the container).
models → /models
data → /var/lib/app/dataThis means the application expects persistent storage at /models and /var/lib/app/data. When you deploy the workload, PodWarden creates persistent volumes (PVCs) or connects to storage backends (NFS, S3) depending on your cluster's storage configuration.
What happens without volumes
If a workload writes data to a folder that isn't backed by a volume, that data is lost when the container restarts. This is fine for stateless applications (web servers, proxies), but databases, model stores, and media libraries must have volumes configured.
Volume sizing
PodWarden templates don't specify volume sizes — the size depends on your storage class and cluster configuration. When deploying, you can set the PVC size based on your data needs. As a rough guide:
| Use case | Typical storage needed |
|---|---|
| Application config and logs | 1–5 GB |
| Small database (Postgres, SQLite) | 10–50 GB |
| Container registry (Harbor, Gitea) | 50–500 GB |
| Media library (Jellyfin, Plex) | 500 GB – several TB |
| ML model storage (Ollama, vLLM) | 50–500 GB depending on model count |
| Render output (Blender, FFmpeg) | 100 GB – several TB |
See the Storage guide for details on configuring NFS, S3, and other storage backends.
Network Requirements
The required_network_types field declares what kind of network connectivity the workload needs. This isn't about bandwidth or speed — it's about reachability.
| Type | What it means | When it's needed |
|---|---|---|
| public | The workload must be reachable from the public internet | Web-facing services, public APIs, content delivery, anything end-users access directly |
| mesh | The workload must be reachable via VPN (Tailscale, WireGuard) | Internal tools, dashboards, databases, services that should not be publicly exposed |
| lan | The workload must be reachable on the local network | Services that depend on local hardware (NAS, printers, IoT devices) or need low-latency LAN access |
A workload can require multiple types. For example, a web application might require public (to serve users) and mesh (to reach an internal database on the VPN).
What PodWarden checks
Before deploying, PodWarden compares the workload's required network types against the target cluster's available networks. If the cluster doesn't support a required type, PodWarden shows a warning. The deploy isn't blocked — the cluster might still work depending on your network setup — but the warning tells you something may need attention.
See the Networking guide for details on configuring network types and troubleshooting connectivity.
Putting It All Together
Here's a real example — the DCGM Exporter template from the Hub catalog:
| Field | Value | What it means |
|---|---|---|
| CPU | 100m | Needs just a tenth of a core — it's a lightweight metrics exporter that polls GPU stats periodically |
| RAM | 128Mi | About 134 MB of memory — minimal footprint, it collects and serves metrics without heavy processing |
| GPU | 1 | Needs access to one GPU — not to run computations, but to read GPU metrics via NVIDIA's DCGM library |
| VRAM | — | No VRAM request — it reads GPU stats but doesn't allocate GPU memory for computation |
| Ports | 9400/TCP | Exposes a Prometheus metrics endpoint on port 9400 |
| Network | mesh | Only needs to be reachable by your monitoring stack (Prometheus/Grafana), not the public internet |
Compare that to a heavier workload like Ollama (LLM inference server):
| Field | Value | What it means |
|---|---|---|
| CPU | 2 | Two full cores — needs real processing power for tokenization, prompt handling, and model orchestration |
| RAM | 8Gi | About 8.6 GB — the model weights load into GPU memory, but the server itself needs substantial RAM for context handling |
| GPU | 1 | One GPU for model inference |
| VRAM | 8Gi | At least 8 GB of GPU memory — enough for 7B parameter models; larger models need more |
| Volumes | models → /models | Persistent storage for downloaded model files (can be tens of GB per model) |
| Ports | 11434/TCP | Ollama's API endpoint |
| Network | mesh | Typically accessed by other internal services or a local UI, not directly from the internet |
Quick Reference Card
| Resource | Unit | "Small" | "Medium" | "Large" |
|---|---|---|---|---|
| CPU | millicores / cores | 100m–250m | 500m–1 | 2–8 |
| RAM | Mi / Gi | 64Mi–256Mi | 512Mi–2Gi | 4Gi–32Gi |
| GPU | count | 0 | 1 | 2–8 |
| VRAM | Gi | — | 8Gi–16Gi | 24Gi–80Gi |
Rules of thumb:
- Monitoring agents and exporters (Prometheus exporters, log forwarders):
100m–250mCPU,64Mi–256MiRAM, no GPU - Web applications and APIs (Nginx, Node.js, Rails):
250m–1CPU,256Mi–2GiRAM, no GPU - Databases (Postgres, MySQL, Redis):
500m–2CPU,1Gi–8GiRAM, no GPU, volumes required - AI inference (Ollama, vLLM, TGI):
1–4CPU,4Gi–16GiRAM, 1 GPU,8Gi–80GiVRAM - AI training (PyTorch, JAX):
4–8CPU,16Gi–64GiRAM, 1–8 GPUs,24Gi–80GiVRAM - Media processing (FFmpeg, Blender):
2–8CPU,4Gi–16GiRAM, 0–1 GPU