FerroDruid
Rust-native Apache-Druid-compatible OLAP
A Rust-native, Apache-Druid-spec-compatible real-time OLAP database. It speaks the Druid REST API, native query JSON, and Druid SQL, and reads/writes Druid segment v9/v10 binaries — without a JVM, without ZooKeeper, and without a six-process control plane. The single binary boots in under a second on under 200 MB of RAM.
A classic Apache Druid cluster needs six or more JVM processes plus ZooKeeper plus an external metadata database and 16 GB+ of RAM before it serves a single query; FerroDruid's single-binary mode replaces all of that with one process that ships as a self-contained AMI. v0.2.0 serves all eight native query types (timeseries, topN, groupBy, scan, search, segmentMetadata, dataSourceMetadata, timeBoundary); runs Druid SQL (SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, 30+ functions, EXPLAIN PLAN FOR, an MSQ task endpoint, ~95% core SQL parity); exposes 40+ Druid-compatible REST endpoints; reads/writes Druid segment v9/v10; and ingests from Kafka and Kinesis supervisors and native batch. Basic auth (Argon2id) + RBAC is on by default, TLS via rustls, with a unique random admin password generated on first boot.
The problem
Apache Druid is a powerful real-time OLAP engine, but a classic cluster needs six or more JVM processes plus ZooKeeper plus an external metadata database, and 16 GB or more of RAM, before it serves a single query. Standing up, operating, and monitoring that six-process control plane is heavy, and it is overkill for evaluation environments and smaller deployments. You want Druid's API and segment format, but not the burden of running a JVM and ZooKeeper fleet.
How it works
- 1
Boots as a single binary
Single-binary mode runs one process — no JVM, no ZooKeeper, no external metadata database — that starts in under a second and uses under 200 MB of RAM. It uses SQLite for metadata and the local filesystem for deep storage, and ships as a self-contained AMI.
- 2
Speaks the Druid wire protocol
It speaks the Druid REST API, native query JSON, and Druid SQL, and reads and writes Druid segment v9/v10 binary files. It serves all eight native query types and exposes more than 40 Druid-compatible REST endpoints, so you can point existing Druid clients or an Apache Superset connector straight at it.
- 3
Starts locked down, password change on first login
Basic auth (Argon2id) and RBAC are on by default, with TLS via rustls. On first boot it generates a new random admin password unique to that instance (never a default or shared one) and writes it once to the instance system log. The admin account is flagged must-change, so every API endpoint returns HTTP 403 until the operator POSTs a new password, enforcing a change on first login.
Highlights
Druid-spec wire-compatible (REST + native JSON + Druid SQL, segment v9/v10) — existing Druid clients and queries work.
One binary, no JVM / ZooKeeper / six-process control plane; sub-second boot on under 200 MB RAM.
8 native query types + Druid SQL (~95% core parity) + Kafka / Kinesis ingest; auth + RBAC on by default.
What's included
- Self-contained Amazon Linux 2023 AMI (Graviton / arm64, supporting t4g, c7g, m7g, and r7g class instances)
- Single-binary mode — one process with no JVM, ZooKeeper, or external metadata database, booting in under a second on under 200 MB of RAM (SQLite metadata plus local-filesystem deep storage)
- All eight Druid native query types (timeseries, topN, groupBy, scan, search, segmentMetadata, dataSourceMetadata, timeBoundary) and more than 40 Druid-compatible REST endpoints
- Druid SQL (SELECT / WHERE / GROUP BY / HAVING / ORDER BY / LIMIT, more than 30 functions, EXPLAIN PLAN FOR, an MSQ task endpoint, and approximately 95% core SQL parity)
- Reads and writes Druid segment v9/v10 binary files, with ingestion from Kafka and Kinesis supervisors and via native batch
- Security on by default — Basic auth (Argon2id) plus RBAC, TLS via rustls, and a random per-instance admin password generated on first boot that must be changed on first login
- CloudFormation template for deployment behind an ALB (marketplace/cloudformation/ami.yaml), with single-binary single-node as the supported topology (multi-node fails closed by default)
Use cases
Teams that want Druid-compatible real-time OLAP without operating a six-process JVM and ZooKeeper cluster
A backend for existing clients that use the Druid REST API, native query JSON, Druid SQL, or an Apache Superset connector
Evaluation and development environments for Druid features using a lightweight binary that boots in under a second on under 200 MB
Single-node streaming and time-series analytics that ingest from Kafka and Kinesis supervisors or via native batch
FAQ
How compatible is it with real Apache Druid?
FerroDruid speaks the Druid REST API, native query JSON, and Druid SQL, and reads and writes segment v9/v10. It covers all eight native query types and more than 40 Druid-compatible REST endpoints, with approximately 95% core Druid SQL parity (not 100%). Live wire deep-match was 5 of 5 against Apache Druid 30.0.1 and 5 of 5 with an Apache Superset connector. Honest scope: live validation is against Druid 30.0.1 and single-binary mode; Druid 31 through 36 is a spec-driven design target not yet cross-validated against a running cluster.
Do I need a JVM, ZooKeeper, or an external metadata database?
Not in single-binary mode. One process boots in under a second and uses under 200 MB of RAM — in contrast to a classic Druid cluster, which needs six or more JVM processes plus ZooKeeper plus an external metadata database and 16 GB or more of RAM. The supported single-binary path uses SQLite for metadata and the local filesystem for deep storage.
Can I run a multi-node configuration?
The supported topology is single-binary single-node; multi-node configurations fail closed by default. Honestly, live validation is against single-binary mode, and we have not validated it as a running multi-node cluster at this time. See docs/KNOWN_LIMITATIONS.md for details.
How is security handled, and what is the first-login flow?
Basic auth (Argon2id) and RBAC are on by default, with TLS via rustls. On first boot the AMI generates a new random admin password unique to that instance (never a default or shared one) and writes it once to the instance system log. The admin account is flagged must-change, so every API endpoint returns HTTP 403 until the operator POSTs a new password to /druid-ext/basic-security/authentication/db/basic/users/admin/credential. The rotated credential is persisted and survives restarts.
How do I deploy it, and how do licensing and billing work?
Deploy it with the provided CloudFormation template (marketplace/cloudformation/ami.yaml) behind an Application Load Balancer; terminate TLS at the ALB and do not expose the service port directly to the internet. Point your clients (REST API, native query JSON, Druid SQL, or an Apache Superset connector) at the load balancer endpoint. This listing sells a hardened, scanned, supported distribution built from the Apache-2.0 source at a pinned release version; the code itself remains Apache-2.0. The AMI is metered automatically by AWS per running instance-hour, with no metering code in the product.
Pricing model
Hourly software fee + EC2 (t4g / c7g / m7g / r7g class, Arm). Metered per instance type.
Other S4 products
S4 — Squished S3
Transparent GPU S3-compression gateway
S4 Logs
Archive CloudWatch Logs to zstd S3
S4 Metrics
Govern CloudWatch metric cardinality