PodWarden
Guides

Understanding Resource Requirements

What CPU, RAM, GPU, VRAM, volumes, and network values mean on every template — and how to choose the right ones

Every stack template in PodWarden specifies resource requirements — how much computing power, memory, storage, and network connectivity the application needs. These values appear on template cards in the Hub catalog and in the stack editor.

This guide explains what each value means, how the notation works, and how to translate it into real-world hardware terms.


CPU

CPU is measured in cores or millicores. One core equals 1,000 millicores.

ValueMeaningRough equivalent
100m0.1 cores (100 millicores)A tenth of a single CPU core — enough for lightweight sidecars, exporters, and agents that mostly sit idle
250m0.25 coresA quarter core — suitable for small helper services, log forwarders, metrics collectors
500m0.5 coresHalf a core — typical for lightweight web servers, caches, single-purpose APIs
11 full coreOne complete CPU core — enough for most standard applications (databases, web apps, CI runners)
22 coresTwo cores — heavier workloads like build servers, media processing, application servers under load
44 coresFour cores — compute-intensive tasks like video transcoding, large database instances
88 coresEight cores — high-performance workloads, multi-threaded batch processing

What "CPU" actually means

A CPU request is a guaranteed minimum. When you set cpu_request: 500m, the cluster guarantees your workload will always have access to at least half a core. If the node has spare capacity, the workload can burst above that — but 500m is what's reserved and always available.

This is not the same as clock speed (GHz). A "core" here means one hardware thread on whatever CPU the node has. A workload requesting 1 core on a node with an AMD EPYC 7763 gets one thread of that processor. The same request on a Raspberry Pi gets one ARM core. The amount of work that core can do varies by hardware, but PodWarden ensures the workload gets the core time it asked for.

How to choose

For most applications, start with the template's default and adjust based on observed usage. PodWarden shows actual CPU consumption in workload logs — if a workload consistently uses less than its request, you can safely lower it. If it's hitting the ceiling, increase it.


Memory (RAM)

Memory is measured in bytes using binary (power-of-two) units. The two you'll see most often are Mi (mebibytes) and Gi (gibibytes).

ValueMeaningIn familiar terms
64Mi64 mebibytes (~67 MB)Minimal — enough for tiny agents and exporters
128Mi128 mebibytes (~134 MB)Small utilities, metrics collectors, lightweight proxies
256Mi256 mebibytes (~268 MB)Small web servers, simple APIs
512Mi512 mebibytes (~537 MB)Moderate applications, light databases
1Gi1 gibibyte (~1.07 GB)Standard applications, small to mid-size databases
2Gi2 gibibytes (~2.15 GB)Application servers, medium databases, build tools
4Gi4 gibibytes (~4.29 GB)Larger databases, Java applications, content management systems
8Gi8 gibibytes (~8.59 GB)Memory-intensive workloads, search engines (Elasticsearch), ML inference
16Gi16 gibibytes (~17.18 GB)Large databases, in-memory caches, heavy ML workloads
32Gi32 gibibytes (~34.36 GB)Large-scale data processing, model training

Mi vs. Gi vs. MB vs. GB

Kubernetes (and PodWarden) uses binary units (Mi, Gi), not decimal units (MB, GB):

  • 1 Mi (mebibyte) = 1,048,576 bytes = 1024 × 1024
  • 1 Gi (gibibyte) = 1,073,741,824 bytes = 1024 × 1024 × 1024
  • 1 MB (megabyte) = 1,000,000 bytes = 1000 × 1000
  • 1 GB (gigabyte) = 1,000,000,000 bytes = 1000 × 1000 × 1000

The difference is small — 1 Gi is about 7% more than 1 GB. In practice, you can think of 1Gi as "roughly 1 GB" and 512Mi as "roughly half a GB."

What a memory request means

Like CPU, the memory request is a guaranteed minimum. The cluster ensures your workload always has that much RAM available. If the workload tries to use significantly more than its request and the node is under memory pressure, the cluster may kill (OOMKill) and restart the container.

For memory-sensitive workloads (databases, caches, ML models), set the request close to the workload's actual peak usage. Undersizing memory causes restarts; oversizing wastes cluster capacity.


GPU

The gpu_count field specifies how many NVIDIA GPUs the workload needs. Most GPU workloads request 1; multi-GPU setups (distributed training, large model inference) may request 2, 4, or 8.

ValueMeaning
0No GPU needed — CPU-only workload
1One GPU — standard for inference, single-GPU training, transcoding
2Two GPUs — larger models, faster training with data parallelism
4+Four or more — distributed training, very large models

How GPU scheduling works

When a workload requests a GPU, the cluster places it on a node that has an available GPU. GPUs are exclusive — one workload gets the entire GPU; they aren't shared or time-sliced (unless you've configured GPU sharing/MIG at the cluster level).

If no node has enough free GPUs, the workload stays pending until one becomes available.


VRAM (GPU Memory)

The vram_request field specifies how much GPU video memory the workload needs. This is separate from system RAM — VRAM is the memory on the GPU card itself.

ValueMeaningTypical use
4Gi4 GB VRAMSmall inference models, basic GPU-accelerated tasks
8Gi8 GB VRAMMid-size models (7B parameter LLMs), hardware transcoding
16Gi16 GB VRAMLarger models (13B parameter LLMs), Stable Diffusion XL
24Gi24 GB VRAMLarge models (30B parameters), training smaller models
48Gi48 GB VRAMVery large models (70B parameters), multi-model serving
80Gi80 GB VRAMCutting-edge models, distributed training shards

VRAM uses the same binary units as system memory (Gi = gibibytes). PodWarden uses the VRAM request to match workloads to nodes with GPUs that have enough video memory. A workload requesting 24Gi VRAM won't be scheduled on a node with an RTX 3060 (12 GB) — it needs at least an RTX 3090, RTX 4090, A5000, or similar.

Common GPU cards and their VRAM

GPUVRAMTypical VRAM request match
RTX 306012 GBup to 12Gi
RTX 3090 / 409024 GBup to 24Gi
A400016 GBup to 16Gi
A500024 GBup to 24Gi
A600048 GBup to 48Gi
A100 (40 GB)40 GBup to 40Gi
A100 (80 GB)80 GBup to 80Gi
H100 (80 GB)80 GBup to 80Gi
Jetson Orin NX 8 GB8 GB (shared)up to 8Gi
Jetson Orin NX 16 GB16 GB (shared)up to 16Gi

Volumes

Volumes define where an application stores persistent data — files that survive container restarts. Each volume mount has a name and a mount path (the folder inside the container).

models → /models
data   → /var/lib/app/data

This means the application expects persistent storage at /models and /var/lib/app/data. When you deploy the workload, PodWarden creates persistent volumes (PVCs) or connects to storage backends (NFS, S3) depending on your cluster's storage configuration.

What happens without volumes

If a workload writes data to a folder that isn't backed by a volume, that data is lost when the container restarts. This is fine for stateless applications (web servers, proxies), but databases, model stores, and media libraries must have volumes configured.

Volume sizing

PodWarden templates don't specify volume sizes — the size depends on your storage class and cluster configuration. When deploying, you can set the PVC size based on your data needs. As a rough guide:

Use caseTypical storage needed
Application config and logs1–5 GB
Small database (Postgres, SQLite)10–50 GB
Container registry (Harbor, Gitea)50–500 GB
Media library (Jellyfin, Plex)500 GB – several TB
ML model storage (Ollama, vLLM)50–500 GB depending on model count
Render output (Blender, FFmpeg)100 GB – several TB

See the Storage guide for details on configuring NFS, S3, and other storage backends.


Network Requirements

The required_network_types field declares what kind of network connectivity the workload needs. This isn't about bandwidth or speed — it's about reachability.

TypeWhat it meansWhen it's needed
publicThe workload must be reachable from the public internetWeb-facing services, public APIs, content delivery, anything end-users access directly
meshThe workload must be reachable via VPN (Tailscale, WireGuard)Internal tools, dashboards, databases, services that should not be publicly exposed
lanThe workload must be reachable on the local networkServices that depend on local hardware (NAS, printers, IoT devices) or need low-latency LAN access

A workload can require multiple types. For example, a web application might require public (to serve users) and mesh (to reach an internal database on the VPN).

What PodWarden checks

Before deploying, PodWarden compares the workload's required network types against the target cluster's available networks. If the cluster doesn't support a required type, PodWarden shows a warning. The deploy isn't blocked — the cluster might still work depending on your network setup — but the warning tells you something may need attention.

See the Networking guide for details on configuring network types and troubleshooting connectivity.


Putting It All Together

Here's a real example — the DCGM Exporter template from the Hub catalog:

FieldValueWhat it means
CPU100mNeeds just a tenth of a core — it's a lightweight metrics exporter that polls GPU stats periodically
RAM128MiAbout 134 MB of memory — minimal footprint, it collects and serves metrics without heavy processing
GPU1Needs access to one GPU — not to run computations, but to read GPU metrics via NVIDIA's DCGM library
VRAMNo VRAM request — it reads GPU stats but doesn't allocate GPU memory for computation
Ports9400/TCPExposes a Prometheus metrics endpoint on port 9400
NetworkmeshOnly needs to be reachable by your monitoring stack (Prometheus/Grafana), not the public internet

Compare that to a heavier workload like Ollama (LLM inference server):

FieldValueWhat it means
CPU2Two full cores — needs real processing power for tokenization, prompt handling, and model orchestration
RAM8GiAbout 8.6 GB — the model weights load into GPU memory, but the server itself needs substantial RAM for context handling
GPU1One GPU for model inference
VRAM8GiAt least 8 GB of GPU memory — enough for 7B parameter models; larger models need more
Volumesmodels → /modelsPersistent storage for downloaded model files (can be tens of GB per model)
Ports11434/TCPOllama's API endpoint
NetworkmeshTypically accessed by other internal services or a local UI, not directly from the internet

Quick Reference Card

ResourceUnit"Small""Medium""Large"
CPUmillicores / cores100m250m500m128
RAMMi / Gi64Mi256Mi512Mi2Gi4Gi32Gi
GPUcount0128
VRAMGi8Gi16Gi24Gi80Gi

Rules of thumb:

  • Monitoring agents and exporters (Prometheus exporters, log forwarders): 100m250m CPU, 64Mi256Mi RAM, no GPU
  • Web applications and APIs (Nginx, Node.js, Rails): 250m1 CPU, 256Mi2Gi RAM, no GPU
  • Databases (Postgres, MySQL, Redis): 500m2 CPU, 1Gi8Gi RAM, no GPU, volumes required
  • AI inference (Ollama, vLLM, TGI): 14 CPU, 4Gi16Gi RAM, 1 GPU, 8Gi80Gi VRAM
  • AI training (PyTorch, JAX): 48 CPU, 16Gi64Gi RAM, 1–8 GPUs, 24Gi80Gi VRAM
  • Media processing (FFmpeg, Blender): 28 CPU, 4Gi16Gi RAM, 0–1 GPU
Understanding Resource Requirements