> ## Documentation Index > Fetch the complete documentation index at: https://docs.controlplane.com/llms.txt > Use this file to discover all available pages before exploring further. # Cassandra ## Overview Apache Cassandra is a distributed NoSQL database designed for high availability and linear scalability. This template deploys a Cassandra 5.0 cluster in a single location where each node owns a slice of the token ring and replicates data to peers according to the configured replication factor. Optional scheduled backups and periodic anti-entropy repair are included. This template does not create a GVC. You must deploy it into an existing GVC. ### What Gets Created * **Stateful Cassandra Workload** — A multi-node Cassandra cluster. Each replica gets its own persistent volume so SSTable data survives restarts. * **Volume Set** — One persistent volume per replica for Cassandra data. * **Identity & Policy** — An identity bound to the workload with `reveal` access to the credential secrets, and cloud storage access when backup is enabled. * **Secrets** — An opaque secret for the superuser password and a dictionary secret for the application user credentials. * **Cron Backup Workload** *(optional)* — When `backup.type` is `logical`, a standalone cron workload exports keyspace data as CSVs and uploads them to cloud storage. * **Sidecar Backup Container** *(optional)* — When `backup.type` is `physical`, a sidecar runs on each Cassandra replica, takes SSTable snapshots with `nodetool snapshot`, and syncs them to cloud storage. * **Repair Cron Workload** *(optional, enabled by default)* — Runs `nodetool repair` on a schedule to keep data consistent across replicas. ## Prerequisites This template has no external prerequisites unless backup is enabled. To install, follow the instructions for your preferred method: Browse, install, and manage templates visually Manage templates from your terminal }> Declare templates in your Terraform configurations Pulumi Icon Streamline Icon: https://streamlinehq.com } > Declare templates in your Pulumi programs ## Configuration The default `values.yaml` for this template: ```yaml theme={null} replicas: 3 # replicationFactor must not exceed replicas replicationFactor: 1 # IMPORTANT: Change all credentials before deploying to production superuserPassword: supersecretpassword username: username password: password keyspaceName: mydatabase image: cassandra:5.0 cpu: 1 memory: 4Gi # JVM heap: leave ~50% of container memory for off-heap (bloom filters, page cache, etc.) # Cassandra 5.x uses G1GC — only MAX_HEAP_SIZE is valid; HEAP_NEWSIZE is ignored. jvmHeapSize: 2G clusterName: my-cassandra volumes: data: initialCapacity: 10 autoscaling: maxCapacity: 100 minFreePercentage: 20 scalingFactor: 1.5 multiZone: enabled: false internal_access: type: same-gvc # Options: same-gvc, same-org, workload-list workloads: #- //gvc/GVC_NAME/workload/WORKLOAD_NAME backup: enabled: false type: logical # options: logical, physical image: ghcr.io/controlplane-com/backup-images/cassandra-backup:5.0 schedule: "0 2 * * *" # daily at 2am UTC resources: cpu: 250m memory: 256Mi provider: aws # options: aws, gcp aws: bucket: my-backup-bucket region: us-east-1 cloudAccountName: my-backup-cloudaccount policyName: my-s3-policy prefix: cassandra/backups gcp: bucket: my-backup-bucket cloudAccountName: my-cloud-account prefix: cassandra/backups repair: enabled: true # Cron schedule for full cluster repair (must run within gc_grace_seconds = 10 days) schedule: "0 2 * * 0" ``` ### Replicas and Replication Factor These are two separate settings that work together: * **`replicas`** — how many Cassandra nodes are deployed. More nodes means more capacity and better throughput, as the token ring is split across more nodes. * **`replicationFactor`** — how many copies of each partition are stored across the cluster. A replication factor of 3 means every row exists on 3 different nodes, so the cluster can survive 2 node failures without data loss (with `QUORUM` consistency). `replicationFactor` must not exceed `replicas` — you cannot store 3 copies of data across only 2 nodes. For production, use at least 3 replicas with a replication factor of 3. This allows the cluster to survive a node failure while still achieving quorum. ### Resources and Storage * `cpu` / `memory` — CPU and memory allocated to each Cassandra node. * `jvmHeapSize` — Set to approximately 50% of `memory`. Cassandra relies heavily on off-heap memory for bloom filters, row cache, and OS page cache. * `volumes.data.initialCapacity` — Initial volume size in GiB per node (minimum 10). * `volumes.data.autoscaling.maxCapacity` — Maximum volume size in GiB. * `volumes.data.autoscaling.minFreePercentage` — Triggers a scale-up when free space falls below this percentage. * `volumes.data.autoscaling.scalingFactor` — Multiplier applied to current capacity on each scale-up. ### Multi-Zone When `multiZone.enabled: true`, Control Plane spreads replicas across availability zones within the location. With a replication factor of 3 across 3 zones, each zone holds one copy of every partition — the cluster survives a complete zone outage with no data loss when using `LOCAL_QUORUM` consistency. Verify your selected location supports multi-zone before enabling this option. ### Internal Access Controls which workloads can reach the Cassandra cluster: | Type | Description | | --------------- | --------------------------------------------------------------- | | `same-gvc` | Allow access from all workloads in the same GVC (recommended) | | `same-org` | Allow access from all workloads in the org | | `workload-list` | Allow access only from specific workloads listed in `workloads` | ### Connecting Each Cassandra replica is individually addressable. Provide multiple node hostnames as contact points in your application so it can discover the full cluster topology: ``` Host: {release-name}-cassandra-0.{gvc-name}.cpln.local {release-name}-cassandra-1.{gvc-name}.cpln.local {release-name}-cassandra-2.{gvc-name}.cpln.local Port: 9042 Username: {username} Password: {password} Keyspace: {keyspaceName} ``` ### Repair Cassandra uses eventual consistency — when nodes miss writes during downtime, data can drift out of sync. `nodetool repair` runs an anti-entropy process that compares and reconciles data across all replicas. Repair must complete across all nodes at least once within `gc_grace_seconds` (default: 10 days) to prevent deleted data from reappearing after a node recovers. * `repair.enabled` — Enable the scheduled repair job (recommended: `true`). * `repair.schedule` — Cron expression for repair frequency. The default weekly schedule satisfies the 10-day `gc_grace_seconds` requirement with margin. Do not disable repair in production or increase the interval beyond 10 days. Repair can be resource-intensive on large datasets — consider running it during low-traffic windows. ## Backup Two backup modes are available: | Mode | How it works | Best for | | ---------- | ------------------------------------------------------------------------------------------------------------------------ | ------------------------------------- | | `logical` | Exports tables as CSVs using `cqlsh COPY TO`, uploads to cloud storage. Runs as a standalone cron workload. | Smaller datasets, portability | | `physical` | Creates SSTable snapshots with `nodetool snapshot`, syncs to cloud storage. Runs as a sidecar on each Cassandra replica. | Large datasets, faster backup/restore | Set `backup.enabled: true`, set `backup.type`, and fill in the cloud storage block for your provider. ### AWS S3 Prerequisites 1. Create an S3 bucket. Set `backup.aws.bucket` to its name and `backup.aws.region` to its region. 2. If you do not have a Cloud Account set up, refer to the docs to [Create a Cloud Account](/guides/create-cloud-account). Set `backup.aws.cloudAccountName` to its name. 3. Create an IAM policy with the following JSON, replacing `YOUR_BUCKET_NAME`: ```json theme={null} { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetObjectVersion", "s3:DeleteObjectVersion" ], "Resource": [ "arn:aws:s3:::YOUR_BUCKET_NAME", "arn:aws:s3:::YOUR_BUCKET_NAME/*" ] } ] } ``` 4. Set `backup.aws.policyName` to the name of the policy created in step 3. ### GCS Prerequisites 1. Create a GCS bucket. Set `backup.gcp.bucket` to its name. 2. If you do not have a Cloud Account set up, refer to the docs to [Create a Cloud Account](/guides/create-cloud-account). Set `backup.gcp.cloudAccountName` to its name. 3. Add the **Storage Admin** role to the GCP service account associated with the Cloud Account. ## Restoring a Backup ### Logical Restore Exec into the backup cron workload and run `restore.sh` with the timestamp of the backup to restore: ```bash theme={null} RESTORE_TIMESTAMP=2026-05-15T02-00-00Z /usr/local/bin/restore.sh ``` The timestamp matches the backup folder name in your bucket (e.g. `cassandra/backups/2026-05-15T02-00-00Z/`). The script downloads the CSVs and replays them into Cassandra using `cqlsh COPY FROM`. Existing rows with matching primary keys are overwritten; rows not in the backup are left in place. ### Physical Restore Physical backups are per-node — each replica backed up its own SSTable slice. Exec into the **backup sidecar container** on each replica and run: ```bash theme={null} RESTORE_TIMESTAMP=2026-05-15T02-00-00Z /usr/local/bin/restore.sh ``` The script downloads snapshot files for that replica, writes them to the shared volume, and calls `nodetool import` to load the SSTables without a restart. Repeat this on every replica. Because each node owns a different token range, restoring only one replica leaves the cluster with incomplete data. ## Important Notes * **Scaling up** — Adding replicas after initial deployment does not automatically rebalance data. Run `nodetool rebuild` on new nodes and `nodetool cleanup` on existing nodes after scaling. * **JVM heap** — Set `jvmHeapSize` to approximately 50% of `memory`. Cassandra relies on off-heap memory for bloom filters, row cache, and OS page cache. * **gc\_grace\_seconds** — The default is 10 days. Ensure repair runs at least once within this window on all nodes, or deleted data may reappear after a node recovers from downtime. * **GVC naming** — This template does not create a GVC. Deploy it into an existing GVC. If you run multiple Cassandra clusters in the same org, give each a distinct `clusterName`. ## External References Official Apache Cassandra documentation Client drivers for connecting to Cassandra View the source files, default values, and chart definition