Kubernetes Deployment
This guide covers everything you need to deploy Ithil to Kubernetes — building the image, configuring health probes, graceful shutdown timing, Redis durability, and the Redis failure policy decision.
Building the image
A multi-stage Dockerfile is included at the repository root. It restores, builds, and publishes the gateway, then downloads the ONNX embedding model from HuggingFace (~23 MB) so the final image is fully self-contained.
docker build -t ithil-gateway:latest .The image listens on port 8080 by default (ASPNETCORE_URLS=http://+:8080). Override with the ASPNETCORE_URLS environment variable if you need HTTPS termination at the container level (most Kubernetes setups terminate TLS at the ingress instead).
# Quick local smoke test
docker run --rm -p 8080:8080 \
-e ConnectionStrings__Redis=host.docker.internal:6379 \
-e Ithil__Jwt__SigningKey=your-dev-key-here \
-e Ithil__Jwt__Issuer=ithil-dev \
-e Ithil__Jwt__Audience=ithil-gateway \
ithil-gateway:latestThe ONNX model is baked into the image at build time. If you want to supply your own model (e.g. a fine-tuned variant), mount it over /app/models at runtime and set Ithil__SemanticCache__ModelPath and Ithil__SemanticCache__VocabPath accordingly.
Health probes
Ithil exposes three health endpoints. Configure Kubernetes probes to use the right one for each purpose.
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3Why two separate probes?
| Probe | Endpoint | What it checks | Failure means |
|---|---|---|---|
| Liveness | /health/live | Process is alive and responding | Pod is restarted |
| Readiness | /health/ready | Redis reachable, ONNX model loaded | Traffic diverted, pod not restarted |
Never point the liveness probe at /health/ready. If Redis goes down, liveness would fail, Kubernetes would restart the pod — which can’t reach Redis either — and you’d get an infinite restart loop during an outage. The liveness probe intentionally checks nothing external.
What the readiness probe checks
- Redis — a
PINGagainst the configured Redis connection. Fails if Redis is unreachable. - Embedding model — verifies the ONNX model finished loading at startup. The model is loaded eagerly when the gateway starts; if it fails to load, the process exits immediately rather than serving traffic with a broken cache.
Graceful shutdown
When Kubernetes terminates a pod (rolling deploy, scale-down, node drain), it sends SIGTERM and waits terminationGracePeriodSeconds before force-killing the process with SIGKILL.
Ithil’s shutdown timeout must be less than terminationGracePeriodSeconds so .NET finishes draining before Kubernetes loses patience.
Recommended values
# pod spec
terminationGracePeriodSeconds: 30// appsettings.json
{
"Ithil": {
"Shutdown": {
"TimeoutSeconds": 25
}
}
}The 5-second gap gives the OS time to route the signal and the container runtime time to clean up.
What drains during shutdown
- In-flight HTTP requests — YARP stops accepting new connections and waits for active requests to complete.
- Audit records —
AuditBackgroundWorkerflushes any records queued at shutdown time before the process exits, so the audit trail is not truncated mid-deploy.
If Ithil:Shutdown:TimeoutSeconds exceeds terminationGracePeriodSeconds, Kubernetes will SIGKILL the process before .NET finishes draining. In-flight requests and queued audit records will be lost.
Redis
Connection string
{
"ConnectionStrings": {
"Redis": "your-redis-host:6379,password=secret,ssl=true"
}
}Ithil uses StackExchange.Redis — any connection string format it supports works here.
Redis is always required, even when Ithil:AgentStore:UseInMemory is true. The in-memory flag only swaps the agent config and API key stores — budget enforcement and semantic caching always use Redis.
Persistence tiers
Ithil stores two categories of data in Redis:
| Data | Key pattern | Sensitivity |
|---|---|---|
| Daily token usage | budget:{agentId}:{date} | Low — resets daily, loss just resets the counter |
| Semantic cache entries | cache:{hash} | Low — cache misses degrade performance, not correctness |
Both categories have automatic expiry (budget keys: 48 hours; cache entries: configurable TTL). Neither requires strong durability guarantees.
Recommended persistence mode: RDB snapshots (the Redis default). AOF (append-only file) is not necessary for Ithil’s workload and adds write amplification without meaningful benefit.
For managed Redis services:
- Azure Cache for Redis — Basic or Standard tier is sufficient. Premium is not needed.
- AWS ElastiCache —
cache.t3.microor larger. Enable automatic backups if budget data matters to you. - Redis Cloud — Free tier is adequate for development; Essentials for production.
Redis failure policy
This is a security policy decision. The default (FailOpen) silently bypasses budget governance when Redis is unavailable. Enterprise deployments should evaluate FailClosed.
When Redis is unavailable, Ithil can behave in one of two ways:
| Policy | Budget enforcement | Semantic cache | When to use |
|---|---|---|---|
FailOpen (default) | Bypassed — all agents spend freely | All requests miss | Availability over governance |
FailClosed | Requests rejected with 503 | Requests rejected with 503 | Governance must never be bypassed |
Configure both the budget engine and the cache independently:
{
"Ithil": {
"Budget": {
"FailurePolicy": "FailClosed"
},
"SemanticCache": {
"FailurePolicy": "FailClosed"
}
}
}Choosing a policy
Use FailOpen when:
- Agents must be able to work even during Redis maintenance windows
- Budget overruns during outages are acceptable (you can audit after the fact)
- You have a highly available Redis setup that rarely goes down
Use FailClosed when:
- Your security or compliance team requires that budget limits are always enforced
- You can tolerate a 503 during a Redis outage rather than risk uncontrolled spend
- Agents are performing sensitive or costly operations
Cache write failures are non-fatal under both policies. A failed cache write never causes a request to fail — it just means the next identical request won’t get a cache hit.