Skip to main content

Overview

Ollama is an open-source platform for running large language models locally. This template deploys Ollama alongside Open WebUI as a sidecar interface, with a startup script that automatically pulls the configured default model on first launch. The Ollama API is accessible internally, while the Open WebUI is exposed externally.

What Gets Created

  • Stateful Workload — A single-replica workload with two containers: the Ollama API server and the Open WebUI interface. The API runs on port 11434 (internal only) and the UI runs on port 8080 (externally accessible).
  • Volume Set — Persistent storage shared by both containers: the Ollama API stores model data at /root/.ollama and Open WebUI stores state at /app/backend/data. Supports optional autoscaling.
  • Secret — An opaque startup script that checks for the default model on launch and pulls it from the Ollama registry if not already present.
  • Identity & Policy — An identity bound to the workload with reveal access to the startup script secret.
This template does not create a GVC. You must deploy it into an existing GVC.

Installation

The default configuration requests 6 CPU and 8Gi of memory for the Ollama API container. Depending on your organization’s quotas, you may need to request a limit increase. GPU access requires explicit enablement — contact Control Plane support to enable GPU resources for your organization.
To install, follow the instructions for your preferred method:

Configuration

The default values.yaml for this template:
# Alternatives: llava, gemma, mistral, phi3, etc.
defaultModel: llama3

workload:
  containers:
    ui:
      name: ollama-ui
      image: ghcr.io/open-webui/open-webui:main
      port: 8080
      resources:
        cpu: 500m
        memory: 1Gi
    api:
      name: ollama
      image: ollama/ollama
      port: 11434
      resources:
        cpu: 6
        memory: 8Gi
      gpu:
        nvidia:
          model: t4
          quantity: 1

volumeset:
  initialCapacity: 10
  autoscaling:
    enabled: false
    maxCapacity: 100 # Maximum capacity in GiB
    minFreePercentage: 10 # Trigger scaling when free space drops below this percentage
    scalingFactor: 1.2 # Multiply current capacity by this factor when scaling up
  performanceClass: general-purpose-ssd
  snapshots:
    retentionDuration: 7d

firewall:
  external:
    inboundAllowCIDR:
      - 0.0.0.0/0
    outboundAllowCIDR:
      - 0.0.0.0/0

internal_access:
  type: same-gvc # options: same-gvc, same-org, workload-list
  workloads: # Note: can only be used if type is same-gvc or workload-list
    #- //gvc/GVC_NAME/workload/WORKLOAD_NAME

Default Model

  • defaultModel — The Ollama model to download on first startup if not already present on the volume. Accepts any model name from the Ollama model library (e.g. llama3, llava, gemma, mistral, phi3).

Open WebUI Container

  • workload.containers.ui.image — Open WebUI container image.
  • workload.containers.ui.port — Port the UI listens on (default: 8080).
  • workload.containers.ui.resources.cpu / workload.containers.ui.resources.memory — CPU and memory for the UI container.

Ollama API Container

  • workload.containers.api.image — Ollama container image.
  • workload.containers.api.port — Port the Ollama API listens on (default: 11434).
  • workload.containers.api.resources.cpu / workload.containers.api.resources.memory — CPU and memory for the Ollama API container. Large models require significant resources.
  • workload.containers.api.gpu — Optional GPU configuration:
    • nvidia.model — GPU model type (e.g. t4).
    • nvidia.quantity — Number of GPUs to allocate.
GPU access must be explicitly enabled for your organization by Control Plane support before it can be used in a workload.

Storage

  • volumeset.initialCapacity — Initial volume size in GiB (minimum 10). Large models require more storage — plan accordingly.
  • volumeset.autoscaling.enabled — Automatically expand the volume as it fills. When enabled:
    • maxCapacity — Maximum volume size in GiB.
    • minFreePercentage — Trigger a scale-up when free space drops below this percentage.
    • scalingFactor — Multiply the current capacity by this factor when scaling up.

Firewall

  • firewall.external.inboundAllowCIDR — CIDR ranges allowed to access the Open WebUI externally (default: 0.0.0.0/0).
  • firewall.external.outboundAllowCIDR — CIDR ranges the workload can reach externally. Must include 0.0.0.0/0 (or the Ollama registry) for model downloads to succeed.

Internal Access

  • internal_access.type — Controls which workloads can reach the Ollama API on port 11434 internally:
TypeDescription
same-gvcAllow access from all workloads in the same GVC
same-orgAllow access from all workloads in the same organization
workload-listAllow access only from specific workloads listed in workloads

Accessing Ollama

Open WebUI (browser interface) is available externally at:
https://RELEASE_NAME-ollama.GVC_NAME.cpln.app
Ollama API is available internally to other workloads in the same GVC at:
http://RELEASE_NAME-ollama.GVC_NAME.cpln.local:11434

External References