Ollama - Control Plane

Overview

Ollama is an open-source platform for running large language models locally. This template deploys Ollama alongside Open WebUI as a sidecar interface, with a startup script that automatically pulls the configured default model on first launch. The Ollama API is accessible internally, while the Open WebUI is exposed externally.

What Gets Created

Stateful Workload — A single-replica workload with two containers: the Ollama API server and the Open WebUI interface. The API runs on port 11434 (internal only) and the UI runs on port 8080 (externally accessible).
Volume Set — Persistent storage shared by both containers: the Ollama API stores model data at /root/.ollama and Open WebUI stores state at /app/backend/data. Supports optional autoscaling.
Secret — An opaque startup script that checks for the default model on launch and pulls it from the Ollama registry if not already present.
Identity & Policy — An identity bound to the workload with reveal access to the startup script secret.

This template does not create a GVC. You must deploy it into an existing GVC.

Installation

The default configuration requests 6 CPU and 8Gi of memory for the Ollama API container. Depending on your organization’s quotas, you may need to request a limit increase. GPU access requires explicit enablement — contact Control Plane support to enable GPU resources for your organization.

To install, follow the instructions for your preferred method:

UI

Browse, install, and manage templates visually

CLI

Manage templates from your terminal

Terraform

Declare templates in your Terraform configurations

Pulumi

Declare templates in your Pulumi programs

Configuration

The default values.yaml for this template:

# Alternatives: llava, gemma, mistral, phi3, etc.
defaultModel: llama3

workload:
  containers:
    ui:
      name: ollama-ui
      image: ghcr.io/open-webui/open-webui:main
      port: 8080
      resources:
        cpu: 500m
        memory: 1Gi
    api:
      name: ollama
      image: ollama/ollama
      port: 11434
      resources:
        cpu: 6
        memory: 8Gi
      gpu:
        nvidia:
          model: t4
          quantity: 1

volumeset:
  initialCapacity: 10
  autoscaling:
    enabled: false
    maxCapacity: 100 # Maximum capacity in GiB
    minFreePercentage: 10 # Trigger scaling when free space drops below this percentage
    scalingFactor: 1.2 # Multiply current capacity by this factor when scaling up
  performanceClass: general-purpose-ssd
  snapshots:
    retentionDuration: 7d

firewall:
  external:
    inboundAllowCIDR:
      - 0.0.0.0/0
    outboundAllowCIDR:
      - 0.0.0.0/0

internal_access:
  type: same-gvc # options: same-gvc, same-org, workload-list
  workloads: # Note: can only be used if type is same-gvc or workload-list
    #- //gvc/GVC_NAME/workload/WORKLOAD_NAME

Default Model

defaultModel — The Ollama model to download on first startup if not already present on the volume. Accepts any model name from the Ollama model library (e.g. llama3, llava, gemma, mistral, phi3).

Open WebUI Container

workload.containers.ui.image — Open WebUI container image.
workload.containers.ui.port — Port the UI listens on (default: 8080).
workload.containers.ui.resources.cpu / workload.containers.ui.resources.memory — CPU and memory for the UI container.

Ollama API Container

workload.containers.api.image — Ollama container image.
workload.containers.api.port — Port the Ollama API listens on (default: 11434).
workload.containers.api.resources.cpu / workload.containers.api.resources.memory — CPU and memory for the Ollama API container. Large models require significant resources.
workload.containers.api.gpu — Optional GPU configuration:
- nvidia.model — GPU model type (e.g. t4).
- nvidia.quantity — Number of GPUs to allocate.

GPU access must be explicitly enabled for your organization by Control Plane support before it can be used in a workload.

Storage

volumeset.initialCapacity — Initial volume size in GiB (minimum 10). Large models require more storage — plan accordingly.
volumeset.autoscaling.enabled — Automatically expand the volume as it fills. When enabled:
- maxCapacity — Maximum volume size in GiB.
- minFreePercentage — Trigger a scale-up when free space drops below this percentage.
- scalingFactor — Multiply the current capacity by this factor when scaling up.

Firewall

firewall.external.inboundAllowCIDR — CIDR ranges allowed to access the Open WebUI externally (default: 0.0.0.0/0).
firewall.external.outboundAllowCIDR — CIDR ranges the workload can reach externally. Must include 0.0.0.0/0 (or the Ollama registry) for model downloads to succeed.

Internal Access

internal_access.type — Controls which workloads can reach the Ollama API on port 11434 internally:

Type	Description
`same-gvc`	Allow access from all workloads in the same GVC
`same-org`	Allow access from all workloads in the same organization
`workload-list`	Allow access only from specific workloads listed in `workloads`

Accessing Ollama

Open WebUI (browser interface) is available externally at:

https://RELEASE_NAME-ollama.GVC_NAME.cpln.app

Ollama API is available internally to other workloads in the same GVC at:

http://RELEASE_NAME-ollama.GVC_NAME.cpln.local:11434

External References

Ollama Documentation

Official Ollama repository and documentation

Open WebUI

Open WebUI repository and documentation

Ollama Model Library

Browse available models for Ollama

Ollama Template

View the source files, default values, and chart definition

​Overview

​What Gets Created

​Installation

UI

CLI

Terraform

Pulumi

​Configuration

​Default Model

​Open WebUI Container

​Ollama API Container

​Storage

​Firewall

​Internal Access

​Accessing Ollama

​External References

Ollama Documentation

Open WebUI

Ollama Model Library

Ollama Template

Overview

What Gets Created

Installation

Configuration

Default Model

Open WebUI Container

Ollama API Container

Storage

Firewall

Internal Access

Accessing Ollama

External References