Skip to main content

Container Connectivity Troubleshooting

This guide summarizes common connectivity issues we hit when running the router with Docker Compose or Kubernetes and how we fixed them. It also covers the “No data” problem in Grafana and how to validate the full metrics chain.

1. Use IPv4 addresses for backend endpoints

Symptoms

  • Router/Envoy timeouts, 5xx, or “up/down” flapping in Prometheus. Curl from inside containers/pods fails.

Root causes

  • Backend bound only to 127.0.0.1 (not reachable from containers/pods).
  • Using IPv6 or hostnames that resolve to IPv6 where IPv6 is disabled/blocked.
  • Using localhost/127.0.0.1 in the router config, which refers to the container itself, not the host.

Fixes

  • Ensure backends bind to all interfaces: 0.0.0.0.
  • In Docker Compose, configure the router to call the host via a reachable IPv4 address.
    • On macOS, host.docker.internal usually works; if not, use the host’s LAN IPv4 address.
    • On Linux or custom networks, use the Docker host gateway IPv4 for your network.

Example: start vLLM on the host

# Make vLLM listen on all interfaces
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 --port 11434 \
--served-model-name phi4

Router config example (Docker Compose)

# config/config.yaml (snippet)
llm_backends:
- name: phi4
# Use a reachable IPv4; replace with your host’s IP
address: http://172.28.0.1:11434

Kubernetes recommended pattern: use a Service

apiVersion: v1
kind: Service
metadata:
name: my-vllm
spec:
selector:
app: my-vllm
ports:
- name: http
port: 8000
targetPort: 8000

Router config then uses: http://my-vllm.default.svc.cluster.local:8000

Tip: discover the host gateway from inside a container (mostly Linux)

# Inside the container/pod
ip route | awk '/default/ {print $3}'

2. Host firewall blocking container/pod traffic

Symptoms

  • Host can curl the backend, but containers/pods time out until the firewall is opened.

Fixes

  • macOS: System Settings → Network → Firewall. Allow incoming connections for the backend process (e.g., Python/uvicorn) or temporarily disable the firewall to test.
  • Linux examples:
# UFW (Ubuntu/Debian)
sudo ufw allow 11434/tcp
sudo ufw allow 11435/tcp

# firewalld (RHEL/CentOS/Fedora)
sudo firewall-cmd --add-port=11434/tcp --permanent
sudo firewall-cmd --add-port=11435/tcp --permanent
sudo firewall-cmd --reload
  • Cloud hosts: also open security group/ACL rules.

Validate from the container/pod:

docker compose exec semantic-router curl -sS http://<IPv4>:11434/v1/models

3. Docker Compose: publish the router’s ports (not just expose)

Symptoms

  • Can’t access /metrics or API from the host. docker ps shows no published ports.

Root cause

  • Using expose only keeps ports internal to the Compose network; it doesn’t publish to the host.

Fix

  • Map the needed ports with ports:.

Example docker-compose.yml snippet

services:
semantic-router:
# ...
ports:
- "9190:9190" # Prometheus /metrics
- "50051:50051" # gRPC/HTTP API (use your actual service port)

Validate from the host:

curl -sS http://localhost:9190/metrics | head -n 5

4. Grafana dashboard shows “No data”

Common causes and fixes

  • Metrics not emitted yet

    • Some panels are empty until code paths are hit. Examples:
      • Cost: llm_model_cost_total{currency="USD"} grows only when cost is recorded.
      • Refusals: llm_request_errors_total{reason="pii_policy_denied"|"jailbreak_block"} grows only when policies block requests.
    • Generate relevant traffic or enable filters/policies to see data.
  • Panel query nuances

    • Classification bar gauge often needs instant query.
    • Quantiles require histogram buckets.

Useful PromQL examples (for Explore)

# Category classification (instant)
sum by (category) (llm_category_classifications_count)

# Cost rate (USD/sec)
sum by (model) (rate(llm_model_cost_total{currency="USD"}[5m]))

# Refusals per model
sum by (model) (rate(llm_request_errors_total{reason=~"pii_policy_denied|jailbreak_block"}[5m]))

# Refusal rate percentage
100 * sum by (model) (rate(llm_request_errors_total{reason=~"pii_policy_denied|jailbreak_block"}[5m]))
/ sum by (model) (rate(llm_model_requests_total[5m]))

# Latency p95
histogram_quantile(0.95, sum by (le) (rate(llm_model_completion_latency_seconds_bucket[5m])))

Prometheus scrape config (verify targets are UP)

scrape_configs:
- job_name: semantic-router
static_configs:
- targets: ["semantic-router:9190"]

- job_name: envoy
metrics_path: /stats/prometheus
static_configs:
- targets: ["envoy-proxy:19000"]

Time range & refresh

  • Select a window that includes your recent traffic (Last 5–15 minutes) and refresh the dashboard after sending test requests.

Quick checklist

  • Backends listen on 0.0.0.0; router uses a reachable IPv4 address (or k8s Service DNS that resolves to IPv4).
  • Host firewall allows the backend ports; cloud SG/ACL opened if applicable.
  • In Docker Compose, router ports are published (e.g., 9190 for /metrics, service port for API).
  • Prometheus targets for semantic-router:9190 and envoy-proxy:19000 are UP.
  • Send traffic that triggers the metrics you expect (cost/refusals) and adjust panel query mode (instant vs. range) where needed.