Documentation Index
Fetch the complete documentation index at: https://docs.controlplane.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Apache Cassandra is a distributed NoSQL database designed for high availability and linear scalability. This template deploys a Cassandra 5.0 cluster in a single location where each node owns a slice of the token ring and replicates data to peers according to the configured replication factor. Optional scheduled backups and periodic anti-entropy repair are included.This template does not create a GVC. You must deploy it into an existing GVC.
What Gets Created
- Stateful Cassandra Workload — A multi-node Cassandra cluster. Each replica gets its own persistent volume so SSTable data survives restarts.
- Volume Set — One persistent volume per replica for Cassandra data.
- Identity & Policy — An identity bound to the workload with
revealaccess to the credential secrets, and cloud storage access when backup is enabled. - Secrets — An opaque secret for the superuser password and a dictionary secret for the application user credentials.
- Cron Backup Workload (optional) — When
backup.typeislogical, a standalone cron workload exports keyspace data as CSVs and uploads them to cloud storage. - Sidecar Backup Container (optional) — When
backup.typeisphysical, a sidecar runs on each Cassandra replica, takes SSTable snapshots withnodetool snapshot, and syncs them to cloud storage. - Repair Cron Workload (optional, enabled by default) — Runs
nodetool repairon a schedule to keep data consistent across replicas.
Prerequisites
This template has no external prerequisites unless backup is enabled. To install, follow the instructions for your preferred method:UI
Browse, install, and manage templates visually
CLI
Manage templates from your terminal
Terraform
Declare templates in your Terraform configurations
Pulumi
Declare templates in your Pulumi programs
Configuration
The defaultvalues.yaml for this template:
Replicas and Replication Factor
These are two separate settings that work together:replicas— how many Cassandra nodes are deployed. More nodes means more capacity and better throughput, as the token ring is split across more nodes.replicationFactor— how many copies of each partition are stored across the cluster. A replication factor of 3 means every row exists on 3 different nodes, so the cluster can survive 2 node failures without data loss (withQUORUMconsistency).
replicationFactor must not exceed replicas — you cannot store 3 copies of data across only 2 nodes.
For production, use at least 3 replicas with a replication factor of 3. This allows the cluster to survive a node failure while still achieving quorum.
Resources and Storage
cpu/memory— CPU and memory allocated to each Cassandra node.jvmHeapSize— Set to approximately 50% ofmemory. Cassandra relies heavily on off-heap memory for bloom filters, row cache, and OS page cache.volumes.data.initialCapacity— Initial volume size in GiB per node (minimum 10).volumes.data.autoscaling.maxCapacity— Maximum volume size in GiB.volumes.data.autoscaling.minFreePercentage— Triggers a scale-up when free space falls below this percentage.volumes.data.autoscaling.scalingFactor— Multiplier applied to current capacity on each scale-up.
Multi-Zone
WhenmultiZone.enabled: true, Control Plane spreads replicas across availability zones within the location. With a replication factor of 3 across 3 zones, each zone holds one copy of every partition — the cluster survives a complete zone outage with no data loss when using LOCAL_QUORUM consistency.
Verify your selected location supports multi-zone before enabling this option.
Internal Access
Controls which workloads can reach the Cassandra cluster:| Type | Description |
|---|---|
same-gvc | Allow access from all workloads in the same GVC (recommended) |
same-org | Allow access from all workloads in the org |
workload-list | Allow access only from specific workloads listed in workloads |
Connecting
Each Cassandra replica is individually addressable. Provide multiple node hostnames as contact points in your application so it can discover the full cluster topology:Repair
Cassandra uses eventual consistency — when nodes miss writes during downtime, data can drift out of sync.nodetool repair runs an anti-entropy process that compares and reconciles data across all replicas.
Repair must complete across all nodes at least once within gc_grace_seconds (default: 10 days) to prevent deleted data from reappearing after a node recovers.
repair.enabled— Enable the scheduled repair job (recommended:true).repair.schedule— Cron expression for repair frequency. The default weekly schedule satisfies the 10-daygc_grace_secondsrequirement with margin.
Backup
Two backup modes are available:| Mode | How it works | Best for |
|---|---|---|
logical | Exports tables as CSVs using cqlsh COPY TO, uploads to cloud storage. Runs as a standalone cron workload. | Smaller datasets, portability |
physical | Creates SSTable snapshots with nodetool snapshot, syncs to cloud storage. Runs as a sidecar on each Cassandra replica. | Large datasets, faster backup/restore |
backup.enabled: true, set backup.type, and fill in the cloud storage block for your provider.
AWS S3 Prerequisites
- Create an S3 bucket. Set
backup.aws.bucketto its name andbackup.aws.regionto its region. - If you do not have a Cloud Account set up, refer to the docs to Create a Cloud Account. Set
backup.aws.cloudAccountNameto its name. - Create an IAM policy with the following JSON, replacing
YOUR_BUCKET_NAME:
- Set
backup.aws.policyNameto the name of the policy created in step 3.
GCS Prerequisites
- Create a GCS bucket. Set
backup.gcp.bucketto its name. - If you do not have a Cloud Account set up, refer to the docs to Create a Cloud Account. Set
backup.gcp.cloudAccountNameto its name. - Add the Storage Admin role to the GCP service account associated with the Cloud Account.
Restoring a Backup
Logical Restore
Exec into the backup cron workload and runrestore.sh with the timestamp of the backup to restore:
cassandra/backups/2026-05-15T02-00-00Z/). The script downloads the CSVs and replays them into Cassandra using cqlsh COPY FROM. Existing rows with matching primary keys are overwritten; rows not in the backup are left in place.
Physical Restore
Physical backups are per-node — each replica backed up its own SSTable slice. Exec into the backup sidecar container on each replica and run:nodetool import to load the SSTables without a restart.
Important Notes
- Scaling up — Adding replicas after initial deployment does not automatically rebalance data. Run
nodetool rebuildon new nodes andnodetool cleanupon existing nodes after scaling. - JVM heap — Set
jvmHeapSizeto approximately 50% ofmemory. Cassandra relies on off-heap memory for bloom filters, row cache, and OS page cache. - gc_grace_seconds — The default is 10 days. Ensure repair runs at least once within this window on all nodes, or deleted data may reappear after a node recovers from downtime.
- GVC naming — This template does not create a GVC. Deploy it into an existing GVC. If you run multiple Cassandra clusters in the same org, give each a distinct
clusterName.
External References
Cassandra Documentation
Official Apache Cassandra documentation
Cassandra Driver Matrix
Client drivers for connecting to Cassandra
Cassandra Template
View the source files, default values, and chart definition