Agent Best Practices
Patterns, conventions, and tips for AI agents managing PodWarden infrastructure via MCP
This page documents the intended usage patterns for AI agents using PodWarden's MCP tools. Follow these guidelines to avoid common mistakes and make the most of the available tools.
Deployment Workflow
The standard flow for deploying an application from the Hub catalog:
1. list_hub_templates or get_hub_template → find the template
2. import_hub_template → creates a local stack
3. check_cluster_capacity(cluster_id) → verify resources available
4. create_deployment → bind stack to cluster
5. deploy_workload → trigger K8s deployment
6. get_workload_logs → verify pods are Ready
7. proxy_to_service / run_in_pod → post-deploy configurationDo NOT skip step 6. Always verify pods are Ready after deploying. Many issues (PVC permissions, missing env vars, image pull errors) only surface at runtime.
Pre-Deploy Capacity Check
Before creating a deployment, use check_cluster_capacity to verify the target cluster has enough CPU and memory:
check_cluster_capacity(cluster_id="...")This shows per-node hardware, total vs. requested resources, and the largest single-pod capacity. Hub templates often specify production-grade resource requests (e.g. 4Gi memory) that may exceed what's available on small clusters — check first to avoid Unschedulable errors.
Env Schema and Auto-Generation
Stacks define an env_schema — a list of environment variables with metadata. When creating a deployment:
- Required vars without
generate: You must provide values inenv_values - Required vars with
generate: PodWarden auto-generates them. Do NOT manually generate secrets — just omit them fromenv_valuesand PodWarden fills them in - Optional vars with
default_value: Used if no explicit value provided - App secrets: If a var name matches an app secret key, the secret value is used
Supported generate types: hex8 (16 chars), hex16 (32 chars), hex32 (64 chars), password (URL-safe 32 chars).
Config Templates
Compose stacks can define config templates in the x-podwarden extension. After deployment, use update_config_template to modify individual config files without resending all values:
update_config_template(
deployment_id="...",
key="harbor.yml",
content="...new content..."
)Then redeploy with deploy_workload for changes to take effect.
Post-Deploy API Configuration
Many applications (Flussonic, Harbor, Grafana, etc.) require configuration via their own REST API after deployment. Use proxy_to_service to make HTTP requests to the app's ClusterIP without needing NodePort, SSH access, or port-forwarding:
# GET request to check app health
proxy_to_service(deployment_id="...", path="/api/v1/health")
# POST to configure a Flussonic stream
proxy_to_service(
deployment_id="...",
method="POST",
path="/flussonic/api/v3/streams/my-stream",
body='{"input": "rtmp://source:1935/live"}',
headers={"Authorization": "Bearer token123"}
)
# PUT to update Harbor OIDC config
proxy_to_service(
deployment_id="...",
method="PUT",
path="/api/v2.0/configurations",
port=8080,
body='{"auth_mode": "oidc_auth"}'
)This creates a temporary curl pod inside the cluster, sends the request, and returns the response. The curl pod is auto-cleaned after use. For commands that aren't HTTP (e.g. redis-cli, psql), use run_in_pod instead.
Compose Stacks
Key Rules
-
Always declare
ports:on services that receive connections. Without ports, PodWarden creates no K8s Service — the container is unreachable by other pods. -
Named volumes shared between services become a single shared PVC. This is intentional — it allows services to share data through a common volume.
-
Env var values referencing compose service names are auto-rewritten. Both exact matches (
DB_HOST=postgres) and embedded URLs (REDIS_URL=redis://cache:6379) are rewritten to Kubernetes FQDNs. -
command:maps to Kubernetesargs, notcommand. This preserves the image's ENTRYPOINT.
Modifying Compose Source
Use update_stack with the compose_source parameter to modify a stack's Docker Compose YAML:
update_stack(
definition_id="...",
compose_source="...new compose YAML..."
)This is useful for adding user: directives, adjusting ports, or modifying service configuration after import from Hub.
fsGroup and PVC Permissions
PodWarden auto-detects the correct fsGroup for well-known container images when a compose service has volumes but no user: directive. This prevents the common issue where volumes mount as root:root but the container runs as a non-root user.
Supported images include: postgres, redis, mysql, mariadb, mongo, elasticsearch, grafana, prometheus, minio, harbor (all goharbor/* images), gitea, nextcloud, jenkins, keycloak, vault, consul, and more.
For custom or unlisted images, add user: "UID:GID" to the compose service:
services:
myapp:
image: custom/app:latest
user: "1000:1000"
volumes:
- data:/app/dataStorage
NFS StorageClasses
Use create_nfs_storage_class to deploy an NFS provisioner to a cluster from a PodWarden storage connection:
create_nfs_storage_class(
connection_id="...", # NFS storage connection UUID
cluster_id="...", # target cluster
storage_class_name="nfs"
)This creates the provisioner deployment, RBAC, and StorageClass. Workloads can then use storage_class: "nfs" in their volume mounts.
Storage Connection Testing
Always use test_storage_connection before deploying workloads that depend on storage. For NFS, this checks TCP reachability, RPC exports, and runs a mount + read/write speed test via SSH. For S3, it tests authentication, bucket access, and upload/download speed.
Pod Exec
When to Use
Use run_in_pod for tasks inside application containers — no SSH access needed:
- Fixing PVC permissions:
chown -R 999:999 /data - Running app-specific CLI tools:
psql,redis-cli,harbor-cli - Checking config files mounted in the container:
cat /etc/app/config.yml - Debugging connectivity from inside the pod:
wget -qO- http://other-service:8080/health - Inspecting data directories:
ls -la /data,du -sh /data/*
Safety Model
run_in_pod has a layered safety model:
- Scope verification — only pods with the
app=label matching a PodWarden-managed deployment can be targeted. You cannot exec into arbitrary pods. - Command blocklist — destructive commands are blocked before execution:
rm -rf /,mkfs,dd— filesystem destructionsudo,su— privilege escalationmount,umount,nsenter— host escapecurl|sh,wget|sh— remote code executionnc -l,ncat,socat— reverse shells
- Audit logging — every command is logged with deployment ID, pod name, command text, and exit code
- No interactive sessions — commands run via
/bin/sh -cand return output. No shell allocation. - Read-only default — most use cases are read-only inspection. The blocklist prevents the most dangerous write operations.
Examples
# Fix PVC permissions for a Harbor database
run_in_pod(deployment_id="...", command="chown -R 999:999 /var/lib/postgresql/data", container="harbor-db")
# Check Redis connectivity
run_in_pod(deployment_id="...", command="redis-cli ping")
# Inspect a config file
run_in_pod(deployment_id="...", command="cat /etc/harbor/harbor.yml")
# Check disk usage
run_in_pod(deployment_id="...", command="df -h")
# Target a specific pod in a multi-replica deployment
run_in_pod(deployment_id="...", command="whoami", pod_name="harbor-core-abc123")run_in_pod vs Node SSH Access
run_in_pod | request_node_access + run_node_command | |
|---|---|---|
| Approval required | No (operator role sufficient) | Yes (human must approve each session) |
| Scope | Inside a specific container | Full node access |
| Use case | App-level debugging, PVC fixes, CLI tools | System-level: journalctl, systemctl, ip, kubectl |
| Risk level | Low (container-isolated, blocklisted) | Higher (node-level access) |
Prefer run_in_pod whenever the task can be done from inside the container. Only use Node SSH Access for node-level operations.
Node SSH Access
When to Use
Use request_node_access + run_node_command for tasks that require node-level access and can't be done with run_in_pod:
- Checking
journalctllogs for K3s or system services - Inspecting node-level networking (
ip addr,ss -tlnp) - Running
kubectlcommands with complex arguments - Checking disk usage, memory, or process status
Best Practices
- Request short durations. Start with 5 minutes. Request more if needed.
- Single-quoted arguments are safe. The command validator strips single-quoted content before checking for blocked operators. Use this for JSON arguments:
kubectl get configmap myapp-cfg -o jsonpath='{.data.config\.yml}' - Pipes are allowed to:
grep,egrep,head,tail,wc,sort,awk,sed,jq,cut,uniq,tr,column,less,more,cat,xargs. - Grants expire. If a grant expires, request a new one. Don't retry commands on an expired grant.
- One grant per host. Each grant is for a specific host. If you need to run commands on multiple hosts, request separate grants.
What Gets Blocked
- Command chaining:
;,&&,|| - Command substitution:
$(...),`...` - Redirection:
>,<,>> - Destructive commands:
rm -r,reboot,mkfs,dd if= - Service disruption:
systemctl stop k3s,iptables -F - Code execution:
curl|sh,python -c,eval
Troubleshooting
First Step: troubleshoot_workload
Always start with troubleshoot_workload when diagnosing issues. It aggregates:
- Deployment configuration and status
- Stack definition details
- Cluster status and node health
- Pod events and container logs
- Network compatibility results
This is faster and more reliable than calling multiple tools individually.
Common Patterns
| Symptom | Likely Cause | Action |
|---|---|---|
Pods stuck in Pending | Insufficient resources or node selector mismatch | Check cluster node capacity with get_cluster_extended |
CrashLoopBackOff | Container starts and crashes | Check get_workload_logs for error output |
ImagePullBackOff | Wrong image or missing registry credentials | Verify image name/tag, check registry_credentials on the stack |
PVC PermissionDenied | Container runs as non-root but volume is root-owned | Add user: directive to compose, or verify image is in the known-UID map |
| Services can't reach each other | Missing ports: in compose definition | Add ports: to every service that should be reachable |
| Env var not substituted | Variable not in env_schema or not in env_values | Check env_schema on the stack, ensure value is provided or has default |
Redeployment
After fixing configuration issues (env values, config templates, compose source), always redeploy:
deploy_workload(assignment_id="...")Changes to env_values, config_values, or the stack definition only take effect after redeployment.
Tool Reference Quick Links
- Read the full tool list: Available Tools
- Understand the concepts: Read the
podwarden://docs/conceptsresource - Follow the deployment guide: Read the
podwarden://docs/deployment-workflowresource