Kafka Without Disks? Let’s Talk About KIP-1150 (Diskless Topics)
TL;DR:
KIP-1150 (Diskless Topics) proposes a new way for Kafka to store data directly into cloud storage (like AWS S3), skipping local broker disks. This can massively cut costs and simplify scaling. But it also brings higher latencies and tighter cloud dependencies. Great for log-heavy, batchy workloads; not so great if you need ultra-low latency.
Quick Note
This post is my personal attempt to understand and analyze KIP-1150. It’s not a “universal truth” or an “ultimate guide.” I’m just sharing my thoughts and inviting you to discuss it with me. :)
Introduction
Imagine if your Apache Kafka cluster could work without storing data on local disks at all. No disk replication between brokers. No expensive SSDs burning your cloud bill. No endless cluster rebalancing. Sounds weird, right?
Well, that’s exactly what KIP-1150 proposes: Diskless Topics — topics that don’t store data on brokers but instead write it directly to a shared cloud storage like Amazon S3.
It’s a bold move. Let’s dig into what it means in practice — the good, the bad, and whether it’s something worth using.
Pros of Diskless Topics (KIP-1150)
Huge Cost Savings (~80%)
By removing local disk replication and storing data once in cloud storage, Kafka clusters can cut storage and network costs by up to 80%.
No more paying for 3x replicated EBS volumes. No more expensive cross-AZ traffic.
In cloud deployments, storage and networking are often 80–90% of Kafka’s total cost. So yeah, this is a big deal.
Easier and Faster Scaling
Need more capacity? Just spin up new brokers — no rebalancing massive data volumes. New nodes immediately start serving traffic because the data already lives in S3.
Scaling down? Just kill brokers. Again, no rebalancing pain.
It also improves load balancing: any broker in an AZ can handle writes for a diskless topic, no partition leader bottlenecks anymore.
Disaster Recovery and Geo-Replication for Free
Since data lives in cloud storage that’s already replicated across multiple data centers, you get disaster recovery “for free.”
If an AZ or even a whole region goes down, your data is still safe. New brokers in another region can pull data from S3 without hassle.
Also, your Kafka logs are now naturally ready for analytical queries, lakehouse pipelines, and batch processing, because they’re already in the cloud storage.
No Forced Migration
Diskless Topics are optional. You can have both classic (disk-based) and diskless topics in the same Kafka cluster.
No need to change your Kafka clients. No need to rewrite your applications. You just create a topic with topic.type=diskless
and you're good to go.
This “extend, don’t rewrite” philosophy is awesome.
Cons of Diskless Topics (KIP-1150)
Higher Latencies
Nothing comes for free. Writing and reading from external cloud storage is slower than local disk I/O.
Typical end-to-end latencies could go up to 200–400 ms depending on batch sizes and storage settings.
That’s fine for many logging, telemetry, or analytics use cases, but it’s a dealbreaker if you need sub-100ms real-time pipelines (like fintech, trading, high-frequency monitoring).
Basically: Diskless Topics trade latency for cost.
Cloud Dependency
Diskless Topics shine in public cloud environments.
If you’re running Kafka on-premises, the story gets tricky. You’ll need to manage your own object storage (like MinIO or Ceph), which adds complexity.
Also, you now fully trust your cloud provider’s storage for durability and availability. (Yes, S3 is crazy reliable, but still — it’s a shift in architecture mindset.)
And don’t forget, cloud storage APIs (PUT/GET/LIST) cost money too. If your workloads produce tons of small messages, API costs might partially eat into your savings.
Finding the right segment sizes (small for latency, big for API savings) becomes a tuning exercise.
Added System Complexity
Under the hood, Diskless Topics introduce a new component: Batch Coordinator — it assigns global offsets to incoming events.
This is a new potential point of failure. If your Batch Coordinator crashes, it could stall diskless topics.
Also, some features like log compaction, exactly-once semantics, and transactional guarantees need to be rethought for diskless storage.
KIP-1150 handles this well in design, but let’s be real — production-grade stability will take time and iterations.
It’s an Evolution, Not a Revolution
Diskless Topics keep the Kafka model: partitions, offsets, consumer groups, etc.
If you hoped this would magically fix partitioning pain or enable global message ordering — sorry, not happening.
This is a pragmatic enhancement to Kafka’s storage layer, not a full rewrite.
Conclusion
KIP-1150 and Diskless Topics bring a fresh, exciting shift to how we can operate Kafka in the cloud.
Massive cost savings, simpler scaling, and better integration with lakehouse architectures make them super attractive.
But — and it’s a big but — they are not a silver bullet. Higher latencies, cloud vendor dependency, and new moving parts mean you need to think carefully about where to apply them.
Still, I personally believe Diskless Topics can become a game-changer for cost-efficient, cloud-native Kafka deployments.
What do you think? Would you consider using Diskless Topics in your setup? Let’s discuss! 🚀