Skip to main content

Overview

ClickHouse is a high-performance, column-oriented analytical database designed for real-time querying and data warehousing at scale. This template deploys a ClickHouse cluster with ClickHouse Keeper for distributed coordination, backed by object storage (AWS S3 or GCS) for long-term scalable storage and a local volume for fast read caching.

What Gets Created

  • GVC — A dedicated GVC across the specified locations (minimum 3 required).
  • ClickHouse Server — The main analytical database workload with configurable replicas per location.
  • ClickHouse Keeper — The coordination service workload (1 replica per location, always 3 total).
  • Volume Sets — Persistent storage for both the server (metadata, state, and system files) and Keeper.
  • Secrets — A database config secret with credentials and cluster name, startup script secrets for ClickHouse Server and Keeper, and a storage configuration secret for the selected cloud provider (AWS S3 or GCS).
  • Identity & Policy — An identity bound to the workloads with reveal access to the template secrets, and cloud access for reading and writing to object storage.

Architecture

Each location maps to one ClickHouse Keeper replica, forming a 3-node quorum for distributed coordination. ClickHouse Server replicas communicate with Keeper using Control Plane’s internal DNS. Primary data is stored in the configured object storage bucket; a local scratch volume serves as a fast read cache.
To minimize network egress costs, deploy all locations in the same cloud provider and keep your object storage bucket in the same region(s). Using 1 replica per location for the ClickHouse server workload is sufficient.

Prerequisites

Before installing this template, configure object storage access in either AWS or GCS.

AWS S3

  1. Create an S3 bucket. Note the bucket name and region — you will set these as aws.bucket and aws.region in your values file.
  2. If you do not have a Control Plane Cloud Account set up, follow the Create a Cloud Account guide. Set aws.cloudAccountName to the name of your Cloud Account.
  3. Create an IAM policy with the following JSON, replacing YOUR_BUCKET_NAME:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetObjectVersion",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_NAME",
                "arn:aws:s3:::YOUR_BUCKET_NAME/*"
            ]
        }
    ]
}
  1. Set aws.policyName in your values file to the name of the policy created in step 3.

GCS

ClickHouse requires S3-compatible HMAC authentication for GCS. A Control Plane Cloud Account is not required.
  1. Create a GCS bucket. Set gcp.bucket in your values file to the bucket name.
  2. In the GCP console, navigate to Settings > Interoperability and click Create a key for a service account.
  3. Click Create new account, name your service account, and assign the Storage Object Admin role under Permissions.
  4. Copy the generated HMAC key and set gcp.accessKeyId and gcp.secretAccessKey in your values file.
Alternatively, use the gcloud CLI:
gcloud config set project YOUR_PROJECT_ID

gcloud storage buckets create gs://YOUR_BUCKET_NAME --location=NAM4

gcloud iam service-accounts create clickhouse-storage

gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
  --member="serviceAccount:clickhouse-storage@$(gcloud config get-value project).iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

gsutil hmac create clickhouse-storage@$(gcloud config get-value project).iam.gserviceaccount.com

Installation

To install, follow the instructions for your preferred method:

Configuration

The default values.yaml for this template:
gvc:
  name: clickhouse-gvc
  # Replica count will only affect the server workload
  # Keeper will always have 3 total replicas (1 per location)
  locations: # Minimum of 3 locations are required
    - name: aws-us-west-2
      replicas: 1
    - name: aws-us-east-2
      replicas: 1
    - name: aws-us-east-1
      replicas: 1

provider: aws # Options: aws or gcp

aws: # If enabled, all fields below are required - See prerequisites for guidance
  bucket: clickhouse-s3-bucket # Name of your S3 bucket
  region: us-east-1 # Region of your S3 bucket
  cloudAccountName: clickhouse-s3-cloudaccount # Name of your Cloud Account
  policyName: clickhouse-s3-policy # Name of your pre-created policy

gcp: # If enabled, all fields below are required - See prerequisites for guidance
  bucket: clickhouse-gcs-bucket # Name of your GCS bucket
  accessKeyId: gcs-access-key-id # HMAC access key ID
  secretAccessKey: gcs-secret-access-key # HMAC secret access key

clusterName: my_cluster # Name of ClickHouse cluster

database: # Automatically create a database on initialization using the default user
  name: mydatabase
  password: mypassword

volumeset:
  server:
    capacity: 10 # initial capacity in GiB (minimum is 10)
  keeper:
    capacity: 10 # initial capacity in GiB (minimum is 10)

server:
  resources:
    cpu: 2
    memory: 2Gi
  internal_access:
    type: same-gvc # options: same-gvc, same-org, workload-list
    workloads: # Note: can only be used if type is same-gvc or workload-list
      #- //gvc/GVC_NAME/workload/WORKLOAD_NAME

keeper:
  resources:
    cpu: 2
    memory: 2Gi
  internal_access:
    type: same-gvc # options: same-gvc, same-org, workload-list
    workloads: # Note: can only be used if type is same-gvc or workload-list
      #- //gvc/GVC_NAME/workload/WORKLOAD_NAME

Locations and Replicas

Configure gvc.locations with at least 3 locations. The replicas value controls how many ClickHouse Server replicas run in each location — 1 per location is sufficient. ClickHouse Keeper always runs exactly 1 replica per location (3 total) regardless of this setting.
This template creates a GVC with a default name defined in the values file. If you plan to deploy multiple instances, you must assign a unique GVC name for each deployment.

Provider and Object Storage

Set provider to either aws or gcp, then fill in the corresponding section. AWS S3
FieldDescription
aws.bucketName of the S3 bucket
aws.regionAWS region where the bucket resides
aws.cloudAccountNameName of the Control Plane Cloud Account with S3 access
aws.policyNameName of the IAM policy granting access to the bucket
GCS
FieldDescription
gcp.bucketName of the GCS bucket
gcp.accessKeyIdHMAC access key ID for the GCS service account
gcp.secretAccessKeyHMAC secret access key for the GCS service account

Cluster and Database

  • clusterName — The name used for distributed DDL queries across the ClickHouse cluster.
  • database.name — Database created automatically on first initialization.
  • database.password — Password for the default ClickHouse user. Change before deploying to production.
These values are only applied on first initialization when the data directory is empty. Updating them after the initial deployment will have no effect on the running cluster. To change credentials or the database name on an existing instance, use ClickHouse’s native commands (e.g. ALTER USER, RENAME DATABASE).

Resources and Storage

  • server.resources / keeper.resources — CPU and memory allocated to each workload.
  • volumeset.server.capacity / volumeset.keeper.capacity — Persistent volume size in GiB for server and Keeper state (minimum 10 each).

Internal Access

Both server.internal_access and keeper.internal_access control which workloads can reach each component:
TypeDescription
same-gvcAllow access from all workloads in the same GVC
same-orgAllow access from all workloads in the same organization
workload-listAllow access only from specific workloads listed in workloads

Connecting to ClickHouse

Once deployed, connect using the ClickHouse client from a workload in the same GVC:
clickhouse-client --host $WORKLOAD_NAME --password $PASSWORD

External References