Skip to Content
GuidesSemantic Cache

Semantic Cache

Ithil’s semantic cache returns cached tool responses when an incoming tool call is semantically equivalent to a previous one — even if the arguments are worded differently. This reduces downstream API load and speeds up agent responses.

How it works

  1. The incoming tool call arguments are serialized to a string
  2. The string is embedded using the ONNX model (all-MiniLM-L6-v2) running in-process
  3. The resulting vector is compared against cached vectors stored in Redis using cosine similarity
  4. If the highest similarity score exceeds the configured threshold, the cached response is returned
  5. If no cache hit, the call proceeds normally. The response is embedded and cached in Redis for future requests.

ONNX model setup

The semantic cache requires the all-MiniLM-L6-v2 model in ONNX format and its vocabulary file.

Download:

The model files are bundled in the Docker image. For manual installs, download from the Ithil releases page :

  • all-MiniLM-L6-v2.onnx
  • vocab.txt

Place them in a directory accessible to the gateway, then configure the paths:

{ "Ithil": { "SemanticCache": { "ModelPath": "models/all-MiniLM-L6-v2.onnx", "VocabPath": "models/vocab.txt" } } }

Paths can be relative to the gateway’s working directory or absolute.

Similarity threshold

The threshold controls how similar two queries must be before the cached response is returned.

SettingEffect
Too low (e.g., 0.70)Many cache hits, including semantically different queries — risk of wrong responses
Too high (e.g., 0.99)Almost no cache hits — near-identical wording required
Recommended (0.92)Catches rephrased but semantically identical queries; avoids false positives
{ "Ithil": { "SemanticCache": { "SimilarityThreshold": 0.92 } } }

Tune this value based on the diversity of your tool arguments and the acceptable risk of a stale cache hit.

Cache invalidation

Cache entries are stored in Redis with a configurable TTL:

{ "Ithil": { "SemanticCache": { "TtlSeconds": 300 } } }

Default: 300 seconds (5 minutes). After expiry, the next matching query will call the downstream API and refresh the cache.

Manual invalidation is not currently supported. To clear the entire cache, flush the Redis database or restart with a fresh Redis instance.

When to disable the cache

Do not cache tools that return real-time or user-specific data. A cached “current stock price” or “logged-in user’s account balance” is almost certainly wrong by the time it’s served.

Tools that should not be cached:

  • Real-time data (prices, inventory, sensor readings)
  • User-specific data (account details, session state)
  • Write operations (the cache is checked before write tools execute — ensure AllowWrite = true tools are not inadvertently cached)

Per-tool cache opt-out is planned for a future release. Currently, disable the feature globally if any of your tools return real-time data:

{ "Ithil": { "SemanticCache": { "Enabled": false } } }
Last updated on