Running Hermes Agent on EKS: from docker-compose to Helm

During a two-day internal workshop we set out to evaluate Hermes Agent as a self-hosted AI agent platform. What started as a quick docker-compose spin-up ended with a custom Helm chart, an ECR-backed deployment on EKS, and some genuine appreciation for how much thought Nous Research have put into the agent loop design. Here is the full arc from local container to production-ish Kubernetes.

What is Hermes?

Hermes is an open-source AI agent framework developed by Nous Research, an independent AI research organization. The headline feature is a closed learning loop: the agent creates skills from experience, refines them during subsequent use, maintains a persistent memory of who you are across sessions, and searches its own past conversations to recall relevant context. It is not tied to a specific model. Point it at OpenRouter, AWS Bedrock, or any OpenAI-compatible endpoint and swap models on the fly with hermes model.

Something worth clarifying upfront, because the “self-improving” framing tends to trigger a mental model of reinforcement learning: Hermes does not fine-tune or retrain the underlying model. There is no gradient update happening anywhere. The improvement is entirely at the context engineering layer: skills are stored as structured prompts, memory is curated text that gets injected into the system context, and user profiles accumulate facts that shape how the agent frames responses. You could describe it as a semi-supervised inference approach: a human is still in the loop validating and pruning what persists, but the agent nudges itself to save useful things and improves its own skill definitions over time. The intelligence comes from the model; the continuity and growth come from what you put in front of it.

Architecturally, Hermes runs as two processes:

Gateway: the main agent loop plus an optional OpenAI-compatible API server on port 8642. This is where conversations happen, tools run, skills execute, and the messaging bridge (Telegram, Slack, Discord, etc.) connects.
Dashboard: a web UI on port 9119 for browsing conversations, managing skills and memory, and monitoring the agent.

Both processes share the same HERMES_HOME directory (~/.hermes locally, /opt/data in containers).

Step 1: getting it running locally with docker-compose

The upstream repository ships a docker-compose.yml that covers the two-service setup cleanly. A few things are immediately worth noting.

services:
  gateway:
    build: .
    image: hermes-agent
    container_name: hermes
    restart: unless-stopped
    network_mode: host
    volumes:
      - ~/.hermes:/opt/data
    environment:
      - HERMES_UID=${HERMES_UID:-10000}
      - HERMES_GID=${HERMES_GID:-10000}
      # Uncomment both to expose the API server:
      # - API_SERVER_HOST=0.0.0.0
      # - API_SERVER_KEY=${API_SERVER_KEY}
    command: ["gateway", "run"]

  dashboard:
    image: hermes-agent
    container_name: hermes-dashboard
    restart: unless-stopped
    network_mode: host
    depends_on:
      - gateway
    volumes:
      - ~/.hermes:/opt/data
    environment:
      - HERMES_UID=${HERMES_UID:-10000}
      - HERMES_GID=${HERMES_GID:-10000}
    command: ["dashboard", "--host", "127.0.0.1", "--no-open"]

The HERMES_UID / HERMES_GID pattern is handled by an s6-overlay init stage inside the image: the container starts as root, remaps the internal hermes user to match your host UID via gosu/usermod, then drops privileges before any supervised service starts. This keeps files created under /opt/data readable by your host user without needing chown gymnastics.

network_mode: host on a developer laptop is intentional. The gateway and dashboard talk to each other via localhost, and the host user gets direct access on familiar ports without any port-mapping boilerplate.

The dashboard intentionally binds to 127.0.0.1 by default. The upstream comments spell out why: the dashboard stores API keys and has no authentication layer. If you want remote access, the right move is an SSH tunnel (ssh -L 9119:localhost:9119), not opening it on 0.0.0.0 on a shared network.

The quickest start:

HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d

Then open http://localhost:9119 for the dashboard and, once you run hermes model to pick a provider, start a conversation via the CLI or the dashboard chat pane.

Step 2: building the image and pushing to ECR

For the EKS deployment we needed the image in our private ECR registry rather than built locally.

# Authenticate to ECR
aws ecr get-login-password --region eu-central-1 \
  | docker login --username AWS --password-stdin \
    <account-id>.dkr.ecr.eu-central-1.amazonaws.com

# Build from the upstream Dockerfile
docker build -t hermes-agent .

# Tag and push
docker tag hermes-agent \
  <account-id>.dkr.ecr.eu-central-1.amazonaws.com/hermes-agent:latest
docker push \
  <account-id>.dkr.ecr.eu-central-1.amazonaws.com/hermes-agent:latest

The upstream Dockerfile uses a multi-stage build: a build stage installs Python dependencies with uv, and a runtime stage packages the agent on top of a slim base with s6-overlay already baked in. The image is self-contained (no side-car, no external init system to wire up).

In the Kubernetes manifests below, imagePullPolicy: Always ensures each pod restart pulls the freshest :latest. In a real pipeline you would tag by commit SHA and control rollouts that way; for a PoC, latest is fine.

Step 3: the Kubernetes manifest

Here is the raw manifest we converged on. Walk through each section:

Namespace and Secret

apiVersion: v1
kind: Namespace
metadata:
  name: hermes-agent

---
apiVersion: v1
kind: Secret
metadata:
  name: hermes-secrets
  namespace: hermes-agent
type: Opaque
data:
  API_SERVER_KEY: "<base64-encoded-random-key>"

The API_SERVER_KEY authenticates requests to the gateway’s OpenAI-compatible API server. Generate one before applying the manifest:

# Generate a random key and base64-encode it
openssl rand -base64 32 | tr -d '\n' | base64 | kubectl create secret generic hermes-secrets \
  --from-literal=API_SERVER_KEY="$(openssl rand -base64 32)" \
  -n hermes-agent --dry-run=client -o yaml | kubectl apply -f -

Never commit a real key into the YAML file. The placeholder in the manifest above is a reminder to replace it. Better yet, manage the secret out-of-band with something like External Secrets Operator feeding from your secrets manager.

Persistent Volumes

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hermes-data-gateway
  namespace: hermes-agent
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3
  resources:
    requests:
      storage: 10Gi

Both the gateway and the dashboard get their own PVC. ReadWriteOnce maps to EBS GP3. A single node can mount it at a time, which is fine since each deployment runs one replica. If you scale the gateway horizontally you would need ReadWriteMany (EFS), but for an agent that maintains conversational state a single replica is the sensible default anyway.

ServiceAccount with IRSA

apiVersion: v1
kind: ServiceAccount
metadata:
  name: hermes-agent
  namespace: hermes-agent
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/hermes-agent-role

The eks.amazonaws.com/role-arn annotation is the entry point for IRSA (IAM Roles for Service Accounts). If your cluster has the Pod Identity Agent or OIDC provider configured, any pod that uses this ServiceAccount automatically gets short-lived credentials scoped to hermes-agent-role injected into the container (no static AWS keys, no EC2 instance profile sharing).

In our case the IAM role grants bedrock:InvokeModel (and bedrock:InvokeModelWithResponseStream) on the models we wanted available. This lets the gateway call AWS Bedrock directly for inference, so no third-party API key is needed for the LLM provider. The agent’s cloud identity handles authentication. If you are not on AWS or do not care about Bedrock you can remove the annotation entirely and pass provider keys as extra environment variables via the Secret.

Gateway Deployment

The gateway deployment surfaces two environment variables that matter beyond the obvious ones:

env:
  - name: API_SERVER_HOST
    value: "0.0.0.0"
  - name: API_SERVER_KEY
    valueFrom:
      secretKeyRef:
        name: hermes-secrets
        key: API_SERVER_KEY
        optional: true

API_SERVER_HOST: "0.0.0.0" opens the API server on all interfaces inside the pod so the Kubernetes Service (ClusterIP) can route traffic to it. On a developer laptop this would be a security concern; inside a pod with no external route, it just makes the service reachable within the cluster.

API_SERVER_KEY is pulled from the Secret so it never appears in the Deployment spec. optional: true means the pod starts even if the secret key is absent. Useful during initial bootstrapping, but in a real deployment you would remove optional once the secret is in place.

Dashboard Deployment

args: ["dashboard", "--host", "0.0.0.0", "--no-open", "--insecure"]

Three flags that each solve a different problem:

--host 0.0.0.0: same reasoning as the gateway: the ClusterIP Service cannot reach the dashboard if it only listens on loopback.
--no-open: suppresses the “open browser on start” behaviour that makes sense on a developer laptop but would fail silently (or noisily) inside a headless container.
--insecure: this is the important one. The dashboard’s built-in HTTP server enforces that requests arrive on a specific FQDN by default; without --insecure, any request coming from the ALB (with a different Host header or internal routing path) gets rejected. In a container behind a load balancer, --insecure is not a security regression. TLS termination and access control happen at the ALB layer.

Services and Ingress

Both deployments get ClusterIP Services. The interesting piece is the Ingress:

rules:
  - host: hermes.internal.your-domain.com
    http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: hermes-dashboard
              port:
                name: http
        - path: /api/
          pathType: Prefix
          backend:
            service:
              name: hermes-gateway
              port:
                name: api
        - path: /v1/
          pathType: Prefix
          backend:
            service:
              name: hermes-gateway
              port:
                name: api

/ routes to the dashboard; /api/ and /v1/ route to the gateway. The /v1/ prefix is the OpenAI-compatible endpoint, so tooling that speaks the OpenAI chat completions API can point at https://hermes.internal.your-domain.com/v1/ and use Hermes as a drop-in provider.

The Ingress is marked alb.ingress.kubernetes.io/scheme: internal and placed in an internal ALB group; it is never exposed on the public internet. SSL redirect is enforced at the ALB layer with a TLS 1.2+ policy.

A note on model choice and cost

For the workshop we used Claude Sonnet 4.6 (via AWS Bedrock) rather than Opus. The reasoning is straightforward: Opus is roughly 5× more expensive per token than Sonnet, and for the tasks we threw at Hermes (code review, writing scripts, answering questions about internal documentation) Sonnet handled everything well. The longer the agent runs and the more tool calls it makes in a single session, the more token cost adds up; picking the right model tier for the actual task complexity matters.

Hermes makes this easy: hermes model lets you switch provider and model mid-session without losing context. We started some sessions on Sonnet and bumped to Opus only when we hit genuinely complex multi-step reasoning tasks. In practice, that happened maybe once in two days.

Step 4: packaging it as a Helm chart

Once the raw manifest was stable, we extracted it into a Helm chart. The chart structure is minimal:

caruso-hermes/
├── Chart.yaml
├── values.yaml
└── templates/
    ├── _helpers.tpl
    ├── namespace.yaml
    ├── secret.yaml
    ├── pvc.yaml
    ├── serviceaccount.yaml
    ├── deployment-gateway.yaml
    ├── deployment-dashboard.yaml
    ├── service-gateway.yaml
    ├── service-dashboard.yaml
    └── ingress.yaml

The values.yaml exposes the knobs you actually want to change across environments: image tag, resource requests/limits, storage size, IRSA role ARN, ingress host, and whether the namespace and secret are chart-managed or external. Everything in the manifest that was a hardcoded string becomes a template variable.

This matters for a few reasons. The ECR image URI, the IAM role ARN, the ACM certificate ARN, and the ingress hostname are all environment-specific. Without a chart you either maintain multiple copies of the manifest or do find-and-replace in CI. With a chart, a single helm upgrade --install with a per-environment values override is enough.

The secret.create: false option in values lets you tell the chart to skip managing the Secret, so you can use External Secrets Operator or Vault and inject the actual API key outside the chart’s control. This is the pattern we’d use in production.

Where we landed after two days

We got Hermes deployed, configured with a Bedrock-backed model, and ran through a handful of PoC scenarios: having the agent write and execute shell scripts, summarize internal documentation, and coordinate multi-step tasks via its subagent spawning capability. The learning loop worked as advertised. Skills we taught the agent on day one appeared in context on day two without explicit prompting.

What we did not get to explore: the full messaging gateway integration, the cron scheduler, the trajectory compression for training data generation, and deeper MCP server integrations. Hermes has surface area that a two-day workshop barely scratches.

I’ll come back with a more thorough writeup once we have meaningful production usage data: how the skill accumulation holds up over weeks, whether the memory system needs active curation, and what the actual Bedrock cost profile looks like at scale. There is a lot here worth digging into properly.

The code for the Kubernetes manifest and Helm chart is sitting in our internal monorepo. If you are evaluating Hermes for your own infrastructure, the upstream docker-compose is the right starting point. Everything above is just what it took to make it production-grade in our specific environment.