Overview
Manticore Search is a high-performance, open-source search engine built for fast full-text search at scale. This template deploys a distributed Manticore Search cluster using Galera replication for high availability, with an orchestrator API for cluster management, a web UI for monitoring and operations, and support for zero-downtime data imports from AWS S3.What Gets Created
- Manticore Search Workload — A stateful Galera cluster with configurable replicas. Each replica runs a sidecar agent that handles cluster coordination, data imports, and recovery operations.
- Orchestrator API Workload — A REST API service for triggering imports, monitoring cluster health, initiating repairs, and managing backup and restore operations.
- Orchestrator Cron Job — A scheduled workload that runs import, health check, or repair operations on a configured schedule.
- UI Workload — A web dashboard for managing the cluster, monitoring replication, triggering operations, and visualizing queries.
- Volume Set — Persistent storage allocated per replica for Manticore data directories.
- Shared Volume Set — A shared volume accessible by all replicas and the orchestrator, used for slot-based import coordination.
- Secrets — Four opaque secrets: the Manticore
searchdbase configuration, the startup/shutdown handler script, the table schema registry used by the agent, and the agent bearer token. A K6 load test script secret is also created when load testing is enabled. - Identities & Policies — Identities for the main cluster, orchestrator, and backup workloads. Policies grant
revealaccess to the configuration secrets,execpermissions on the orchestrator cron job, andviewaccess to the Manticore workload for orchestration. - Backup Cron Job (optional) — Scheduled logical backups of delta and full tables to an S3 bucket. Enabled when
orchestrator.backup.enabled: true. - Domain (optional) — Routes
/api/*to the orchestrator API and all other traffic to the UI. Enabled whendomain.enabled: true. - Load Test Workload (optional) — A k6-based load test runner with a controller for automated scheduling. Enabled when
loadTest.enabled: true.
This template does not create a GVC. You must deploy it into an existing GVC.
Prerequisites
This template requires an AWS S3 bucket for CSV source data and a Control Plane Cloud Account before installation.AWS S3 (Source Data)
-
Create an S3 bucket and upload your CSV source files. Set
buckets.sourceBucketto the bucket name andbuckets.awsRegionto its region. -
If you do not have a Control Plane Cloud Account set up, follow the Create a Cloud Account guide. Set
buckets.cloudAccountNameto the name of your Cloud Account. -
Set
buckets.awsPolicyRefsto the IAM policies granting S3 access. For read-only source access, use the AWS managed policyaws::AmazonS3ReadOnlyAccess. For custom policies, omit theaws::prefix.
Agent Token
Generate a secure bearer token for internal cluster communication and set it inorchestrator.agent.token:
Anyone with network access to the UI can perform admin operations. Restrict external access using
orchestrator.ui.allowExternalAccess: false or configure the domain firewall to limit access.AWS S3 (Backup, Optional)
Only required iforchestrator.backup.enabled: true:
-
Create a separate S3 bucket for backups. Set
orchestrator.backup.s3Bucketandorchestrator.backup.s3Region. -
Create an IAM policy with the following permissions, replacing
YOUR_BUCKET_NAME:
- Set
orchestrator.backup.cloudAccountNameandorchestrator.backup.s3Policyto the name of your Cloud Account and the custom policy created above.
Installation
To install, follow the instructions for your preferred method:UI
Browse, install, and manage templates visually
CLI
Manage templates from your terminal
Terraform
Declare templates in your Terraform configurations
Pulumi
Declare templates in your Pulumi programs
Configuration
The defaultvalues.yaml for this template:
S3 and Cloud Account
buckets.cloudAccountName— Name of the Control Plane Cloud Account with AWS trust configured.buckets.awsPolicyRefs— IAM policies granting read access to the source S3 bucket. Useaws::AmazonS3ReadOnlyAccessfor the AWS managed policy, or omit theaws::prefix for custom policies.buckets.awsRegion— AWS region of the source S3 bucket.buckets.sourceBucket— Name of the S3 bucket containing CSV files to import.
Tables
Each entry intables defines a searchable index imported from a CSV file.
name— Table name used by Manticore and referenced by the orchestrator.csvPath— List of S3 paths (relative tosourceBucket) for the CSV source files. Multiple paths create a distributed multi-segment table.config.haStrategy— High-availability behavior when agents are unreachable (noerrorsignores unreachable agents).config.agentRetryCount— Number of retry attempts when an agent doesn’t respond.config.segmentCount— Number of distributed segments for this table. Use more than 1 for very large datasets split across multiple CSVs.config.memLimit— Memory limit for the Manticore index (e.g.,2G).config.hasHeader— Set totrueif the CSV file includes a header row.schema.columns— Column definitions. Each column has anameand atype.
| Type | Description |
|---|---|
field | Full-text searchable string field |
attr_uint | Unsigned integer attribute |
attr_float | Float attribute |
attr_bigint | 64-bit integer attribute |
attr_bool | Boolean attribute |
attr_string | Non-indexed string attribute |
attr_json | JSON attribute |
attr_multi | Multi-value unsigned integer attribute |
attr_multi_64 | Multi-value 64-bit integer attribute |
attr_timestamp | Unix timestamp attribute |
Manticore Cluster
manticore.clusterName— Galera cluster name used for replication coordination.manticore.resources— CPU and memory for each Manticore replica.manticore.volumeset.capacity— Persistent storage per replica in GB.manticore.sharedVolumeset.capacity— Shared storage in GB, accessible by all replicas and the orchestrator for import slot coordination.manticore.autoscaling.minScale— Minimum replica count. This value is also used by the orchestrator to determine cluster quorum. It should match your intended fixed replica count.manticore.autoscaling.maxScale— Maximum replica count for autoscaling.
manticore.firewall.internalAccess.type must be set to same-gvc. Galera replication requires direct peer-to-peer communication between all replicas within the GVC.Orchestrator
The orchestrator manages all cluster lifecycle operations including data imports, health checks, and repairs. Cron Joborchestrator.schedule— Cron schedule for automated operations (default: every hour).orchestrator.action— Operation to run:init(initial cluster setup),import(load CSV data),health(check cluster state), orrepair(fix split-brain issues).orchestrator.tableName— The table to target for the cron job. Must match a name intables.orchestrator.suspend— Whentrue, the cron job is created in a suspended state and must be triggered manually via the UI or CLI. Recommended for production.orchestrator.timeoutSeconds— Timeout for each cron container execution.orchestrator.activeDeadlineSeconds— Maximum total runtime for an import job before it is terminated.
orchestrator.api.importPollInterval— How often the API checks import progress.orchestrator.api.importPollTimeout— Maximum time the API waits for an import to complete.orchestrator.api.autoscaling— Min/max replicas and CPU target for the orchestrator API.
orchestrator.agent.token— Bearer token securing all internal communication between the orchestrator and agent sidecars. Generate withopenssl rand -base64 32and change before deploying.orchestrator.agent.import.batchSize— Number of rows per INSERT statement during imports (default: 20000).orchestrator.agent.recovery— Retry settings for Galera cluster recovery after a split-brain event.
orchestrator.ui.allowExternalAccess— Whentrue, the UI is accessible from the internet. Set tofalseto restrict access to within the GVC.
Backup (Optional)
Setorchestrator.backup.enabled: true to enable scheduled backups to S3.
orchestrator.backup.cloudAccountName— Cloud Account with write access to the backup S3 bucket.orchestrator.backup.s3Bucket/s3Region— Backup bucket name and region.orchestrator.backup.s3Policy— Custom IAM policy name(s) granting write access to the backup bucket (see Prerequisites).orchestrator.backup.dataSet— The table dataset to back up.orchestrator.backup.prefix— S3 folder prefix for backup archives.orchestrator.backup.schedules— List of backup schedules. Each entry specifies atable, a backuptype(deltafor incremental,mainfor full), and aschedulein cron format.
Domain (Optional)
domain.enabled— Enable or disable the domain resource.domain.name— Fully qualified domain name (e.g.,manticore.example.com). Requires DNS configuration pointing to the GVC’s load balancer.domain.dnsMode— DNS routing mode:cnamefor subdomain-based routing ornsfor zone delegation.
/api/* to the orchestrator API and all other traffic to the UI.
Load Testing (Optional)
loadTest.enabled— Enable or disable the k6 load test workload.loadTest.vus— Number of virtual users.loadTest.duration— Total test duration (e.g.,30s,5m,1h).loadTest.rps— Target requests per second (null= unlimited).loadTest.target.endpoint— Search endpoint to target:search(JSON body) orsql.loadTest.thresholds.p95ResponseTime— P95 response time threshold in milliseconds.loadTest.thresholds.errorRate— Maximum acceptable error rate (e.g.,0.01= 1%).