Agent Best Practices

Patterns, conventions, and tips for AI agents managing PodWarden infrastructure via MCP

This page documents the intended usage patterns for AI agents using PodWarden's MCP tools. Follow these guidelines to avoid common mistakes and make the most of the available tools.

Deployment Workflow

The standard flow for deploying an application from the Hub catalog:

1. list_hub_templates or get_hub_template     → find the template
2. import_hub_template                        → creates a local stack
3. check_cluster_capacity(cluster_id)         → verify resources available
4. create_deployment                          → bind stack to cluster
5. deploy_workload                            → trigger K8s deployment
6. get_workload_logs                          → verify pods are Ready
7. proxy_to_service / run_in_pod              → post-deploy configuration

Do NOT skip step 6. Always verify pods are Ready after deploying. Many issues (PVC permissions, missing env vars, image pull errors) only surface at runtime.

Pre-Deploy Capacity Check

Before creating a deployment, use check_cluster_capacity to verify the target cluster has enough CPU and memory:

check_cluster_capacity(cluster_id="...")

This shows per-node hardware, total vs. requested resources, and the largest single-pod capacity. Hub templates often specify production-grade resource requests (e.g. 4Gi memory) that may exceed what's available on small clusters — check first to avoid Unschedulable errors.

Env Schema and Auto-Generation

Stacks define an env_schema — a list of environment variables with metadata. When creating a deployment:

Required vars without generate: You must provide values in env_values
Required vars with generate: PodWarden auto-generates them. Do NOT manually generate secrets — just omit them from env_values and PodWarden fills them in
Optional vars with default_value: Used if no explicit value provided
App secrets: If a var name matches an app secret key, the secret value is used

Supported generate types: hex8 (16 chars), hex16 (32 chars), hex32 (64 chars), password (URL-safe 32 chars).

Config Templates

Compose stacks can define config templates in the x-podwarden extension. After deployment, use update_config_template to modify individual config files without resending all values:

update_config_template(
    deployment_id="...",
    key="harbor.yml",
    content="...new content..."
)

Then redeploy with deploy_workload for changes to take effect.

Post-Deploy API Configuration

Many applications (Flussonic, Harbor, Grafana, etc.) require configuration via their own REST API after deployment. Use proxy_to_service to make HTTP requests to the app's ClusterIP without needing NodePort, SSH access, or port-forwarding:

# GET request to check app health
proxy_to_service(deployment_id="...", path="/api/v1/health")

# POST to configure a Flussonic stream
proxy_to_service(
    deployment_id="...",
    method="POST",
    path="/flussonic/api/v3/streams/my-stream",
    body='{"input": "rtmp://source:1935/live"}',
    headers={"Authorization": "Bearer token123"}
)

# PUT to update Harbor OIDC config
proxy_to_service(
    deployment_id="...",
    method="PUT",
    path="/api/v2.0/configurations",
    port=8080,
    body='{"auth_mode": "oidc_auth"}'
)

This creates a temporary curl pod inside the cluster, sends the request, and returns the response. The curl pod is auto-cleaned after use. For commands that aren't HTTP (e.g. redis-cli, psql), use run_in_pod instead.

Compose Stacks

Key Rules

Always declare ports: on services that receive connections. Without ports, PodWarden creates no K8s Service — the container is unreachable by other pods.
Named volumes shared between services become a single shared PVC. This is intentional — it allows services to share data through a common volume.
Env var values referencing compose service names are auto-rewritten. Both exact matches (DB_HOST=postgres) and embedded URLs (REDIS_URL=redis://cache:6379) are rewritten to Kubernetes FQDNs.
command: maps to Kubernetes args, not command. This preserves the image's ENTRYPOINT.

Modifying Compose Source

Use update_stack with the compose_source parameter to modify a stack's Docker Compose YAML:

update_stack(
    definition_id="...",
    compose_source="...new compose YAML..."
)

This is useful for adding user: directives, adjusting ports, or modifying service configuration after import from Hub.

fsGroup and PVC Permissions

PodWarden auto-detects the correct fsGroup for well-known container images when a compose service has volumes but no user: directive. This prevents the common issue where volumes mount as root:root but the container runs as a non-root user.

Supported images include: postgres, redis, mysql, mariadb, mongo, elasticsearch, grafana, prometheus, minio, harbor (all goharbor/* images), gitea, nextcloud, jenkins, keycloak, vault, consul, and more.

For custom or unlisted images, add user: "UID:GID" to the compose service:

services:
  myapp:
    image: custom/app:latest
    user: "1000:1000"
    volumes:
      - data:/app/data

Storage

NFS StorageClasses

Use create_nfs_storage_class to deploy an NFS provisioner to a cluster from a PodWarden storage connection:

create_nfs_storage_class(
    connection_id="...",    # NFS storage connection UUID
    cluster_id="...",       # target cluster
    storage_class_name="nfs"
)

This creates the provisioner deployment, RBAC, and StorageClass. Workloads can then use storage_class: "nfs" in their volume mounts.

Storage Connection Testing

Always use test_storage_connection before deploying workloads that depend on storage. For NFS, this checks TCP reachability, RPC exports, and runs a mount + read/write speed test via SSH. For S3, it tests authentication, bucket access, and upload/download speed.

Pod Exec

When to Use

Use run_in_pod for tasks inside application containers — no SSH access needed:

Fixing PVC permissions: chown -R 999:999 /data
Running app-specific CLI tools: psql, redis-cli, harbor-cli
Checking config files mounted in the container: cat /etc/app/config.yml
Debugging connectivity from inside the pod: wget -qO- http://other-service:8080/health
Inspecting data directories: ls -la /data, du -sh /data/*

Safety Model

run_in_pod has a layered safety model:

Scope verification — only pods with the app= label matching a PodWarden-managed deployment can be targeted. You cannot exec into arbitrary pods.
Command blocklist — destructive commands are blocked before execution:
- rm -rf /, mkfs, dd — filesystem destruction
- sudo, su — privilege escalation
- mount, umount, nsenter — host escape
- curl|sh, wget|sh — remote code execution
- nc -l, ncat, socat — reverse shells
Audit logging — every command is logged with deployment ID, pod name, command text, and exit code
No interactive sessions — commands run via /bin/sh -c and return output. No shell allocation.
Read-only default — most use cases are read-only inspection. The blocklist prevents the most dangerous write operations.

Examples

# Fix PVC permissions for a Harbor database
run_in_pod(deployment_id="...", command="chown -R 999:999 /var/lib/postgresql/data", container="harbor-db")

# Check Redis connectivity
run_in_pod(deployment_id="...", command="redis-cli ping")

# Inspect a config file
run_in_pod(deployment_id="...", command="cat /etc/harbor/harbor.yml")

# Check disk usage
run_in_pod(deployment_id="...", command="df -h")

# Target a specific pod in a multi-replica deployment
run_in_pod(deployment_id="...", command="whoami", pod_name="harbor-core-abc123")

run_in_pod vs Node SSH Access

	`run_in_pod`	`request_node_access` + `run_node_command`
Approval required	No (operator role sufficient)	Yes (human must approve each session)
Scope	Inside a specific container	Full node access
Use case	App-level debugging, PVC fixes, CLI tools	System-level: journalctl, systemctl, ip, kubectl
Risk level	Low (container-isolated, blocklisted)	Higher (node-level access)

Prefer run_in_pod whenever the task can be done from inside the container. Only use Node SSH Access for node-level operations.

Node SSH Access

When to Use

Use request_node_access + run_node_command for tasks that require node-level access and can't be done with run_in_pod:

Checking journalctl logs for K3s or system services
Inspecting node-level networking (ip addr, ss -tlnp)
Running kubectl commands with complex arguments
Checking disk usage, memory, or process status

Best Practices

Request short durations. Start with 5 minutes. Request more if needed.
Single-quoted arguments are safe. The command validator strips single-quoted content before checking for blocked operators. Use this for JSON arguments:
```
kubectl get configmap myapp-cfg -o jsonpath='{.data.config\.yml}'
```
Pipes are allowed to: grep, egrep, head, tail, wc, sort, awk, sed, jq, cut, uniq, tr, column, less, more, cat, xargs.
Grants expire. If a grant expires, request a new one. Don't retry commands on an expired grant.
One grant per host. Each grant is for a specific host. If you need to run commands on multiple hosts, request separate grants.

What Gets Blocked

Command chaining: ;, &&, ||
Command substitution: $(...), `...`
Redirection: >, <, >>
Destructive commands: rm -r, reboot, mkfs, dd if=
Service disruption: systemctl stop k3s, iptables -F
Code execution: curl|sh, python -c, eval

Troubleshooting

First Step: troubleshoot_workload

Always start with troubleshoot_workload when diagnosing issues. It aggregates:

Deployment configuration and status
Stack definition details
Cluster status and node health
Pod events and container logs
Network compatibility results

This is faster and more reliable than calling multiple tools individually.

Common Patterns

Symptom	Likely Cause	Action
Pods stuck in `Pending`	Insufficient resources or node selector mismatch	Check cluster node capacity with `get_cluster_extended`
`CrashLoopBackOff`	Container starts and crashes	Check `get_workload_logs` for error output
`ImagePullBackOff`	Wrong image or missing registry credentials	Verify image name/tag, check `registry_credentials` on the stack
PVC `PermissionDenied`	Container runs as non-root but volume is root-owned	Add `user:` directive to compose, or verify image is in the known-UID map
Services can't reach each other	Missing `ports:` in compose definition	Add `ports:` to every service that should be reachable
Env var not substituted	Variable not in env_schema or not in env_values	Check env_schema on the stack, ensure value is provided or has default

Redeployment

After fixing configuration issues (env values, config templates, compose source), always redeploy:

deploy_workload(assignment_id="...")

Changes to env_values, config_values, or the stack definition only take effect after redeployment.

Tool Reference Quick Links

Read the full tool list: Available Tools
Understand the concepts: Read the podwarden://docs/concepts resource
Follow the deployment guide: Read the podwarden://docs/deployment-workflow resource