Skip to Content
GuidesKubernetes Deployment

Kubernetes Deployment

This guide covers everything you need to deploy Ithil to Kubernetes — building the image, configuring health probes, graceful shutdown timing, Redis durability, and the Redis failure policy decision.


Building the image

A multi-stage Dockerfile is included at the repository root. It restores, builds, and publishes the gateway, then downloads the ONNX embedding model from HuggingFace (~23 MB) so the final image is fully self-contained.

docker build -t ithil-gateway:latest .

The image listens on port 8080 by default (ASPNETCORE_URLS=http://+:8080). Override with the ASPNETCORE_URLS environment variable if you need HTTPS termination at the container level (most Kubernetes setups terminate TLS at the ingress instead).

# Quick local smoke test docker run --rm -p 8080:8080 \ -e ConnectionStrings__Redis=host.docker.internal:6379 \ -e Ithil__Jwt__SigningKey=your-dev-key-here \ -e Ithil__Jwt__Issuer=ithil-dev \ -e Ithil__Jwt__Audience=ithil-gateway \ ithil-gateway:latest

The ONNX model is baked into the image at build time. If you want to supply your own model (e.g. a fine-tuned variant), mount it over /app/models at runtime and set Ithil__SemanticCache__ModelPath and Ithil__SemanticCache__VocabPath accordingly.


Health probes

Ithil exposes three health endpoints. Configure Kubernetes probes to use the right one for each purpose.

livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 10 periodSeconds: 10 failureThreshold: 3 readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 3

Why two separate probes?

ProbeEndpointWhat it checksFailure means
Liveness/health/liveProcess is alive and respondingPod is restarted
Readiness/health/readyRedis reachable, ONNX model loadedTraffic diverted, pod not restarted

Never point the liveness probe at /health/ready. If Redis goes down, liveness would fail, Kubernetes would restart the pod — which can’t reach Redis either — and you’d get an infinite restart loop during an outage. The liveness probe intentionally checks nothing external.

What the readiness probe checks

  • Redis — a PING against the configured Redis connection. Fails if Redis is unreachable.
  • Embedding model — verifies the ONNX model finished loading at startup. The model is loaded eagerly when the gateway starts; if it fails to load, the process exits immediately rather than serving traffic with a broken cache.

Graceful shutdown

When Kubernetes terminates a pod (rolling deploy, scale-down, node drain), it sends SIGTERM and waits terminationGracePeriodSeconds before force-killing the process with SIGKILL.

Ithil’s shutdown timeout must be less than terminationGracePeriodSeconds so .NET finishes draining before Kubernetes loses patience.

# pod spec terminationGracePeriodSeconds: 30
// appsettings.json { "Ithil": { "Shutdown": { "TimeoutSeconds": 25 } } }

The 5-second gap gives the OS time to route the signal and the container runtime time to clean up.

What drains during shutdown

  • In-flight HTTP requests — YARP stops accepting new connections and waits for active requests to complete.
  • Audit recordsAuditBackgroundWorker flushes any records queued at shutdown time before the process exits, so the audit trail is not truncated mid-deploy.

If Ithil:Shutdown:TimeoutSeconds exceeds terminationGracePeriodSeconds, Kubernetes will SIGKILL the process before .NET finishes draining. In-flight requests and queued audit records will be lost.


Redis

Connection string

{ "ConnectionStrings": { "Redis": "your-redis-host:6379,password=secret,ssl=true" } }

Ithil uses StackExchange.Redis  — any connection string format it supports works here.

Redis is always required, even when Ithil:AgentStore:UseInMemory is true. The in-memory flag only swaps the agent config and API key stores — budget enforcement and semantic caching always use Redis.

Persistence tiers

Ithil stores two categories of data in Redis:

DataKey patternSensitivity
Daily token usagebudget:{agentId}:{date}Low — resets daily, loss just resets the counter
Semantic cache entriescache:{hash}Low — cache misses degrade performance, not correctness

Both categories have automatic expiry (budget keys: 48 hours; cache entries: configurable TTL). Neither requires strong durability guarantees.

Recommended persistence mode: RDB snapshots (the Redis default). AOF (append-only file) is not necessary for Ithil’s workload and adds write amplification without meaningful benefit.

For managed Redis services:

  • Azure Cache for Redis — Basic or Standard tier is sufficient. Premium is not needed.
  • AWS ElastiCachecache.t3.micro or larger. Enable automatic backups if budget data matters to you.
  • Redis Cloud — Free tier is adequate for development; Essentials for production.

Redis failure policy

This is a security policy decision. The default (FailOpen) silently bypasses budget governance when Redis is unavailable. Enterprise deployments should evaluate FailClosed.

When Redis is unavailable, Ithil can behave in one of two ways:

PolicyBudget enforcementSemantic cacheWhen to use
FailOpen (default)Bypassed — all agents spend freelyAll requests missAvailability over governance
FailClosedRequests rejected with 503Requests rejected with 503Governance must never be bypassed

Configure both the budget engine and the cache independently:

{ "Ithil": { "Budget": { "FailurePolicy": "FailClosed" }, "SemanticCache": { "FailurePolicy": "FailClosed" } } }

Choosing a policy

Use FailOpen when:

  • Agents must be able to work even during Redis maintenance windows
  • Budget overruns during outages are acceptable (you can audit after the fact)
  • You have a highly available Redis setup that rarely goes down

Use FailClosed when:

  • Your security or compliance team requires that budget limits are always enforced
  • You can tolerate a 503 during a Redis outage rather than risk uncontrolled spend
  • Agents are performing sensitive or costly operations

Cache write failures are non-fatal under both policies. A failed cache write never causes a request to fail — it just means the next identical request won’t get a cache hit.

Last updated on