Overview
Ollama is an open-source platform for running large language models locally. This template deploys Ollama alongside Open WebUI as a sidecar interface, with a startup script that automatically pulls the configured default model on first launch. The Ollama API is accessible internally, while the Open WebUI is exposed externally.What Gets Created
- Stateful Workload — A single-replica workload with two containers: the Ollama API server and the Open WebUI interface. The API runs on port
11434(internal only) and the UI runs on port8080(externally accessible). - Volume Set — Persistent storage shared by both containers: the Ollama API stores model data at
/root/.ollamaand Open WebUI stores state at/app/backend/data. Supports optional autoscaling. - Secret — An opaque startup script that checks for the default model on launch and pulls it from the Ollama registry if not already present.
- Identity & Policy — An identity bound to the workload with
revealaccess to the startup script secret.
This template does not create a GVC. You must deploy it into an existing GVC.
Installation
The default configuration requests 6 CPU and 8Gi of memory for the Ollama API container. Depending on your organization’s quotas, you may need to request a limit increase. GPU access requires explicit enablement — contact Control Plane support to enable GPU resources for your organization.
UI
Browse, install, and manage templates visually
CLI
Manage templates from your terminal
Terraform
Declare templates in your Terraform configurations
Pulumi
Declare templates in your Pulumi programs
Configuration
The defaultvalues.yaml for this template:
Default Model
defaultModel— The Ollama model to download on first startup if not already present on the volume. Accepts any model name from the Ollama model library (e.g.llama3,llava,gemma,mistral,phi3).
Open WebUI Container
workload.containers.ui.image— Open WebUI container image.workload.containers.ui.port— Port the UI listens on (default:8080).workload.containers.ui.resources.cpu/workload.containers.ui.resources.memory— CPU and memory for the UI container.
Ollama API Container
workload.containers.api.image— Ollama container image.workload.containers.api.port— Port the Ollama API listens on (default:11434).workload.containers.api.resources.cpu/workload.containers.api.resources.memory— CPU and memory for the Ollama API container. Large models require significant resources.workload.containers.api.gpu— Optional GPU configuration:nvidia.model— GPU model type (e.g.t4).nvidia.quantity— Number of GPUs to allocate.
GPU access must be explicitly enabled for your organization by Control Plane support before it can be used in a workload.
Storage
volumeset.initialCapacity— Initial volume size in GiB (minimum 10). Large models require more storage — plan accordingly.volumeset.autoscaling.enabled— Automatically expand the volume as it fills. When enabled:maxCapacity— Maximum volume size in GiB.minFreePercentage— Trigger a scale-up when free space drops below this percentage.scalingFactor— Multiply the current capacity by this factor when scaling up.
Firewall
firewall.external.inboundAllowCIDR— CIDR ranges allowed to access the Open WebUI externally (default:0.0.0.0/0).firewall.external.outboundAllowCIDR— CIDR ranges the workload can reach externally. Must include0.0.0.0/0(or the Ollama registry) for model downloads to succeed.
Internal Access
internal_access.type— Controls which workloads can reach the Ollama API on port11434internally:
| Type | Description |
|---|---|
same-gvc | Allow access from all workloads in the same GVC |
same-org | Allow access from all workloads in the same organization |
workload-list | Allow access only from specific workloads listed in workloads |