Understanding Resource Requirements

What CPU, RAM, GPU, VRAM, volumes, and network values mean on every template — and how to choose the right ones

Every stack template in PodWarden specifies resource requirements — how much computing power, memory, storage, and network connectivity the application needs. These values appear on template cards in the Hub catalog and in the stack editor.

This guide explains what each value means, how the notation works, and how to translate it into real-world hardware terms.

CPU

CPU is measured in cores or millicores. One core equals 1,000 millicores.

Value	Meaning	Rough equivalent
`100m`	0.1 cores (100 millicores)	A tenth of a single CPU core — enough for lightweight sidecars, exporters, and agents that mostly sit idle
`250m`	0.25 cores	A quarter core — suitable for small helper services, log forwarders, metrics collectors
`500m`	0.5 cores	Half a core — typical for lightweight web servers, caches, single-purpose APIs
`1`	1 full core	One complete CPU core — enough for most standard applications (databases, web apps, CI runners)
`2`	2 cores	Two cores — heavier workloads like build servers, media processing, application servers under load
`4`	4 cores	Four cores — compute-intensive tasks like video transcoding, large database instances
`8`	8 cores	Eight cores — high-performance workloads, multi-threaded batch processing

What "CPU" actually means

A CPU request is a guaranteed minimum. When you set cpu_request: 500m, the cluster guarantees your workload will always have access to at least half a core. If the node has spare capacity, the workload can burst above that — but 500m is what's reserved and always available.

This is not the same as clock speed (GHz). A "core" here means one hardware thread on whatever CPU the node has. A workload requesting 1 core on a node with an AMD EPYC 7763 gets one thread of that processor. The same request on a Raspberry Pi gets one ARM core. The amount of work that core can do varies by hardware, but PodWarden ensures the workload gets the core time it asked for.

How to choose

For most applications, start with the template's default and adjust based on observed usage. PodWarden shows actual CPU consumption in workload logs — if a workload consistently uses less than its request, you can safely lower it. If it's hitting the ceiling, increase it.

Memory (RAM)

Memory is measured in bytes using binary (power-of-two) units. The two you'll see most often are Mi (mebibytes) and Gi (gibibytes).

Value	Meaning	In familiar terms
`64Mi`	64 mebibytes (~67 MB)	Minimal — enough for tiny agents and exporters
`128Mi`	128 mebibytes (~134 MB)	Small utilities, metrics collectors, lightweight proxies
`256Mi`	256 mebibytes (~268 MB)	Small web servers, simple APIs
`512Mi`	512 mebibytes (~537 MB)	Moderate applications, light databases
`1Gi`	1 gibibyte (~1.07 GB)	Standard applications, small to mid-size databases
`2Gi`	2 gibibytes (~2.15 GB)	Application servers, medium databases, build tools
`4Gi`	4 gibibytes (~4.29 GB)	Larger databases, Java applications, content management systems
`8Gi`	8 gibibytes (~8.59 GB)	Memory-intensive workloads, search engines (Elasticsearch), ML inference
`16Gi`	16 gibibytes (~17.18 GB)	Large databases, in-memory caches, heavy ML workloads
`32Gi`	32 gibibytes (~34.36 GB)	Large-scale data processing, model training

Mi vs. Gi vs. MB vs. GB

Kubernetes (and PodWarden) uses binary units (Mi, Gi), not decimal units (MB, GB):

1 Mi (mebibyte) = 1,048,576 bytes = 1024 × 1024
1 Gi (gibibyte) = 1,073,741,824 bytes = 1024 × 1024 × 1024
1 MB (megabyte) = 1,000,000 bytes = 1000 × 1000
1 GB (gigabyte) = 1,000,000,000 bytes = 1000 × 1000 × 1000

The difference is small — 1 Gi is about 7% more than 1 GB. In practice, you can think of 1Gi as "roughly 1 GB" and 512Mi as "roughly half a GB."

What a memory request means

Like CPU, the memory request is a guaranteed minimum. The cluster ensures your workload always has that much RAM available. If the workload tries to use significantly more than its request and the node is under memory pressure, the cluster may kill (OOMKill) and restart the container.

For memory-sensitive workloads (databases, caches, ML models), set the request close to the workload's actual peak usage. Undersizing memory causes restarts; oversizing wastes cluster capacity.

GPU

The gpu_count field specifies how many NVIDIA GPUs the workload needs. Most GPU workloads request 1; multi-GPU setups (distributed training, large model inference) may request 2, 4, or 8.

Value	Meaning
`0`	No GPU needed — CPU-only workload
`1`	One GPU — standard for inference, single-GPU training, transcoding
`2`	Two GPUs — larger models, faster training with data parallelism
`4+`	Four or more — distributed training, very large models

How GPU scheduling works

When a workload requests a GPU, the cluster places it on a node that has an available GPU. GPUs are exclusive — one workload gets the entire GPU; they aren't shared or time-sliced (unless you've configured GPU sharing/MIG at the cluster level).

If no node has enough free GPUs, the workload stays pending until one becomes available.

VRAM (GPU Memory)

The vram_request field specifies how much GPU video memory the workload needs. This is separate from system RAM — VRAM is the memory on the GPU card itself.

Value	Meaning	Typical use
`4Gi`	4 GB VRAM	Small inference models, basic GPU-accelerated tasks
`8Gi`	8 GB VRAM	Mid-size models (7B parameter LLMs), hardware transcoding
`16Gi`	16 GB VRAM	Larger models (13B parameter LLMs), Stable Diffusion XL
`24Gi`	24 GB VRAM	Large models (30B parameters), training smaller models
`48Gi`	48 GB VRAM	Very large models (70B parameters), multi-model serving
`80Gi`	80 GB VRAM	Cutting-edge models, distributed training shards

VRAM uses the same binary units as system memory (Gi = gibibytes). PodWarden uses the VRAM request to match workloads to nodes with GPUs that have enough video memory. A workload requesting 24Gi VRAM won't be scheduled on a node with an RTX 3060 (12 GB) — it needs at least an RTX 3090, RTX 4090, A5000, or similar.

Common GPU cards and their VRAM

GPU	VRAM	Typical VRAM request match
RTX 3060	12 GB	up to `12Gi`
RTX 3090 / 4090	24 GB	up to `24Gi`
A4000	16 GB	up to `16Gi`
A5000	24 GB	up to `24Gi`
A6000	48 GB	up to `48Gi`
A100 (40 GB)	40 GB	up to `40Gi`
A100 (80 GB)	80 GB	up to `80Gi`
H100 (80 GB)	80 GB	up to `80Gi`
Jetson Orin NX 8 GB	8 GB (shared)	up to `8Gi`
Jetson Orin NX 16 GB	16 GB (shared)	up to `16Gi`

Volumes

Volumes define where an application stores persistent data — files that survive container restarts. Each volume mount has a name and a mount path (the folder inside the container).

models → /models
data   → /var/lib/app/data

This means the application expects persistent storage at /models and /var/lib/app/data. When you deploy the workload, PodWarden creates persistent volumes (PVCs) or connects to storage backends (NFS, S3) depending on your cluster's storage configuration.

What happens without volumes

If a workload writes data to a folder that isn't backed by a volume, that data is lost when the container restarts. This is fine for stateless applications (web servers, proxies), but databases, model stores, and media libraries must have volumes configured.

Volume sizing

PodWarden templates don't specify volume sizes — the size depends on your storage class and cluster configuration. When deploying, you can set the PVC size based on your data needs. As a rough guide:

Use case	Typical storage needed
Application config and logs	1–5 GB
Small database (Postgres, SQLite)	10–50 GB
Container registry (Harbor, Gitea)	50–500 GB
Media library (Jellyfin, Plex)	500 GB – several TB
ML model storage (Ollama, vLLM)	50–500 GB depending on model count
Render output (Blender, FFmpeg)	100 GB – several TB

See the Storage guide for details on configuring NFS, S3, and other storage backends.

Network Requirements

The required_network_types field declares what kind of network connectivity the workload needs. This isn't about bandwidth or speed — it's about reachability.

Type	What it means	When it's needed
public	The workload must be reachable from the public internet	Web-facing services, public APIs, content delivery, anything end-users access directly
mesh	The workload must be reachable via VPN (Tailscale, WireGuard)	Internal tools, dashboards, databases, services that should not be publicly exposed
lan	The workload must be reachable on the local network	Services that depend on local hardware (NAS, printers, IoT devices) or need low-latency LAN access

A workload can require multiple types. For example, a web application might require public (to serve users) and mesh (to reach an internal database on the VPN).

What PodWarden checks

Before deploying, PodWarden compares the workload's required network types against the target cluster's available networks. If the cluster doesn't support a required type, PodWarden shows a warning. The deploy isn't blocked — the cluster might still work depending on your network setup — but the warning tells you something may need attention.

See the Networking guide for details on configuring network types and troubleshooting connectivity.

Putting It All Together

Here's a real example — the DCGM Exporter template from the Hub catalog:

Field	Value	What it means
CPU	`100m`	Needs just a tenth of a core — it's a lightweight metrics exporter that polls GPU stats periodically
RAM	`128Mi`	About 134 MB of memory — minimal footprint, it collects and serves metrics without heavy processing
GPU	`1`	Needs access to one GPU — not to run computations, but to read GPU metrics via NVIDIA's DCGM library
VRAM	—	No VRAM request — it reads GPU stats but doesn't allocate GPU memory for computation
Ports	`9400/TCP`	Exposes a Prometheus metrics endpoint on port 9400
Network	`mesh`	Only needs to be reachable by your monitoring stack (Prometheus/Grafana), not the public internet

Compare that to a heavier workload like Ollama (LLM inference server):

Field	Value	What it means
CPU	`2`	Two full cores — needs real processing power for tokenization, prompt handling, and model orchestration
RAM	`8Gi`	About 8.6 GB — the model weights load into GPU memory, but the server itself needs substantial RAM for context handling
GPU	`1`	One GPU for model inference
VRAM	`8Gi`	At least 8 GB of GPU memory — enough for 7B parameter models; larger models need more
Volumes	`models → /models`	Persistent storage for downloaded model files (can be tens of GB per model)
Ports	`11434/TCP`	Ollama's API endpoint
Network	`mesh`	Typically accessed by other internal services or a local UI, not directly from the internet

Quick Reference Card

Resource	Unit	"Small"	"Medium"	"Large"
CPU	millicores / cores	`100m`–`250m`	`500m`–`1`	`2`–`8`
RAM	Mi / Gi	`64Mi`–`256Mi`	`512Mi`–`2Gi`	`4Gi`–`32Gi`
GPU	count	`0`	`1`	`2`–`8`
VRAM	Gi	—	`8Gi`–`16Gi`	`24Gi`–`80Gi`

Rules of thumb:

Monitoring agents and exporters (Prometheus exporters, log forwarders): 100m–250m CPU, 64Mi–256Mi RAM, no GPU
Web applications and APIs (Nginx, Node.js, Rails): 250m–1 CPU, 256Mi–2Gi RAM, no GPU
Databases (Postgres, MySQL, Redis): 500m–2 CPU, 1Gi–8Gi RAM, no GPU, volumes required
AI inference (Ollama, vLLM, TGI): 1–4 CPU, 4Gi–16Gi RAM, 1 GPU, 8Gi–80Gi VRAM
AI training (PyTorch, JAX): 4–8 CPU, 16Gi–64Gi RAM, 1–8 GPUs, 24Gi–80Gi VRAM
Media processing (FFmpeg, Blender): 2–8 CPU, 4Gi–16Gi RAM, 0–1 GPU