Kubernetes Deployment

This guide covers everything you need to deploy Ithil to Kubernetes — building the image, configuring health probes, graceful shutdown timing, Redis durability, and the Redis failure policy decision.

Building the image

A multi-stage Dockerfile is included at the repository root. It restores, builds, and publishes the gateway, then downloads the ONNX embedding model from HuggingFace (~23 MB) so the final image is fully self-contained.


docker build -t ithil-gateway:latest .

The image listens on port 8080 by default (ASPNETCORE_URLS=http://+:8080). Override with the ASPNETCORE_URLS environment variable if you need HTTPS termination at the container level (most Kubernetes setups terminate TLS at the ingress instead).


# Quick local smoke test
docker run --rm -p 8080:8080 \
  -e ConnectionStrings__Redis=host.docker.internal:6379 \
  -e Ithil__Jwt__SigningKey=your-dev-key-here \
  -e Ithil__Jwt__Issuer=ithil-dev \
  -e Ithil__Jwt__Audience=ithil-gateway \
  ithil-gateway:latest

The ONNX model is baked into the image at build time. If you want to supply your own model (e.g. a fine-tuned variant), mount it over /app/models at runtime and set Ithil__SemanticCache__ModelPath and Ithil__SemanticCache__VocabPath accordingly.

Health probes

Ithil exposes three health endpoints. Configure Kubernetes probes to use the right one for each purpose.


livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3
 
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3

Why two separate probes?

Probe	Endpoint	What it checks	Failure means
Liveness	`/health/live`	Process is alive and responding	Pod is restarted
Readiness	`/health/ready`	Redis reachable, ONNX model loaded	Traffic diverted, pod not restarted

Never point the liveness probe at /health/ready. If Redis goes down, liveness would fail, Kubernetes would restart the pod — which can’t reach Redis either — and you’d get an infinite restart loop during an outage. The liveness probe intentionally checks nothing external.

What the readiness probe checks

Redis — a PING against the configured Redis connection. Fails if Redis is unreachable.
Embedding model — verifies the ONNX model finished loading at startup. The model is loaded eagerly when the gateway starts; if it fails to load, the process exits immediately rather than serving traffic with a broken cache.

Graceful shutdown

When Kubernetes terminates a pod (rolling deploy, scale-down, node drain), it sends SIGTERM and waits terminationGracePeriodSeconds before force-killing the process with SIGKILL.

Ithil’s shutdown timeout must be less than terminationGracePeriodSeconds so .NET finishes draining before Kubernetes loses patience.

Recommended values


# pod spec
terminationGracePeriodSeconds: 30


// appsettings.json
{
  "Ithil": {
    "Shutdown": {
      "TimeoutSeconds": 25
    }
  }
}

The 5-second gap gives the OS time to route the signal and the container runtime time to clean up.

What drains during shutdown

In-flight HTTP requests — YARP stops accepting new connections and waits for active requests to complete.
Audit records — AuditBackgroundWorker flushes any records queued at shutdown time before the process exits, so the audit trail is not truncated mid-deploy.

If Ithil:Shutdown:TimeoutSeconds exceeds terminationGracePeriodSeconds, Kubernetes will SIGKILL the process before .NET finishes draining. In-flight requests and queued audit records will be lost.

Redis

Connection string


{
  "ConnectionStrings": {
    "Redis": "your-redis-host:6379,password=secret,ssl=true"
  }
}

Ithil uses StackExchange.Redis — any connection string format it supports works here.

Redis is always required, even when Ithil:AgentStore:UseInMemory is true. The in-memory flag only swaps the agent config and API key stores — budget enforcement and semantic caching always use Redis.

Persistence tiers

Ithil stores two categories of data in Redis:

Data	Key pattern	Sensitivity
Daily token usage	`budget:{agentId}:{date}`	Low — resets daily, loss just resets the counter
Semantic cache entries	`cache:{hash}`	Low — cache misses degrade performance, not correctness

Both categories have automatic expiry (budget keys: 48 hours; cache entries: configurable TTL). Neither requires strong durability guarantees.

Recommended persistence mode: RDB snapshots (the Redis default). AOF (append-only file) is not necessary for Ithil’s workload and adds write amplification without meaningful benefit.

For managed Redis services:

Azure Cache for Redis — Basic or Standard tier is sufficient. Premium is not needed.
AWS ElastiCache — cache.t3.micro or larger. Enable automatic backups if budget data matters to you.
Redis Cloud — Free tier is adequate for development; Essentials for production.

Redis failure policy

This is a security policy decision. The default (FailOpen) silently bypasses budget governance when Redis is unavailable. Enterprise deployments should evaluate FailClosed.

When Redis is unavailable, Ithil can behave in one of two ways:

Policy	Budget enforcement	Semantic cache	When to use
`FailOpen` (default)	Bypassed — all agents spend freely	All requests miss	Availability over governance
`FailClosed`	Requests rejected with 503	Requests rejected with 503	Governance must never be bypassed

Configure both the budget engine and the cache independently:


{
  "Ithil": {
    "Budget": {
      "FailurePolicy": "FailClosed"
    },
    "SemanticCache": {
      "FailurePolicy": "FailClosed"
    }
  }
}

Choosing a policy

Use FailOpen when:

Agents must be able to work even during Redis maintenance windows
Budget overruns during outages are acceptable (you can audit after the fact)
You have a highly available Redis setup that rarely goes down

Use FailClosed when:

Your security or compliance team requires that budget limits are always enforced
You can tolerate a 503 during a Redis outage rather than risk uncontrolled spend
Agents are performing sensitive or costly operations

Cache write failures are non-fatal under both policies. A failed cache write never causes a request to fail — it just means the next identical request won’t get a cache hit.