<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DevOps IN SPACE: Kafka]]></title><description><![CDATA[This Section will help to understand the Kafka Concept.
]]></description><link>https://devopsguyankit.substack.com/s/kafka</link><image><url>https://substackcdn.com/image/fetch/$s_!K2kC!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa9f505-ddb8-42b5-aaf3-f25fb8290953_271x271.png</url><title>DevOps IN SPACE: Kafka</title><link>https://devopsguyankit.substack.com/s/kafka</link></image><generator>Substack</generator><lastBuildDate>Fri, 19 Jun 2026 02:13:10 GMT</lastBuildDate><atom:link href="https://devopsguyankit.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ankit Ranjan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[devopsguyankit@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[devopsguyankit@substack.com]]></itunes:email><itunes:name><![CDATA[Ankit Ranjan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ankit Ranjan]]></itunes:author><googleplay:owner><![CDATA[devopsguyankit@substack.com]]></googleplay:owner><googleplay:email><![CDATA[devopsguyankit@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ankit Ranjan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Apache Kafka is the Backbone of Modern Real-Time Data Pipelines]]></title><description><![CDATA[From LinkedIn&#8217;s internal tool to the de facto standard for event streaming, how Kafka changed the way we think about data in motion.]]></description><link>https://devopsguyankit.substack.com/p/why-apache-kafka-is-the-backbone</link><guid isPermaLink="false">https://devopsguyankit.substack.com/p/why-apache-kafka-is-the-backbone</guid><dc:creator><![CDATA[Ankit Ranjan]]></dc:creator><pubDate>Wed, 29 Apr 2026 05:05:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TdHy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em>From LinkedIn&#8217;s internal tool to the de facto standard for event streaming, how Kafka changed the way we think about data in motion.</em></p></blockquote><p>Every time you get a fraud alert from your bank, see a &#8220;<strong>your order shipped</strong>&#8221; notification, or watch <strong>Netflix adapt its recommendation</strong> engine within seconds of your last watch, Apache Kafka is almost certainly involved. It&#8217;s become the invisible plumbing of the modern internet and for good reason.</p><p>In this guide, we&#8217;ll peel back the layers of Kafka&#8217;s architecture, understand why it outperforms traditional message queues at scale, and see exactly where and how it fits into production data stacks today.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://devopsguyankit.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DevOps IN SPACE! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vJ56!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vJ56!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 424w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 848w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 1272w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vJ56!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png" width="817" height="147" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:147,&quot;width&quot;:817,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vJ56!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 424w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 848w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 1272w, https://substackcdn.com/image/fetch/$s_!vJ56!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe16e809e-e32b-4a8b-9a21-378db7ae5998_817x147.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><strong>What is Apache Kafka &#8212; and why does it matter?</strong></p><p>Apache Kafka is a distributed event streaming platform built for high-throughput, fault-tolerant, and low-latency data pipelines. Originally created at LinkedIn in 2011 to handle activity stream data and operational metrics, it was open-sourced and donated to the Apache Software Foundation in 2012.</p><blockquote><p><em>At its core, Kafka is not just a message queue. It&#8217;s a distributed commit log, an append-only, ordered, persistent stream of events that any number of consumers can read, at any time, at their own pace.</em></p></blockquote><p>This seemingly simple distinction, a durable, replayable log versus a consumed-and-discarded queue, is what makes Kafka so uniquely powerful in data-intensive architectures.</p><blockquote><p><em><strong>Kafka&#8217;s core architecture</strong></em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KygB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KygB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 424w, https://substackcdn.com/image/fetch/$s_!KygB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 848w, https://substackcdn.com/image/fetch/$s_!KygB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 1272w, https://substackcdn.com/image/fetch/$s_!KygB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KygB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png" width="818" height="179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:179,&quot;width&quot;:818,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!KygB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 424w, https://substackcdn.com/image/fetch/$s_!KygB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 848w, https://substackcdn.com/image/fetch/$s_!KygB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 1272w, https://substackcdn.com/image/fetch/$s_!KygB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0e69a5b-3d37-4434-ab60-aacf868d4d34_818x179.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Each <strong>topic</strong> in Kafka is split into one or more <strong>partitions</strong>. Messages in a partition are strictly ordered and assigned an offset, a monotonically increasing integer that consumers track to know where they left off. This design is what enables Kafka&#8217;s legendary horizontal scalability.</p><p><strong>A producer in 15 lines of Python: &#8212;</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9GRA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9GRA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 424w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 848w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 1272w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9GRA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png" width="817" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:817,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!9GRA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 424w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 848w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 1272w, https://substackcdn.com/image/fetch/$s_!9GRA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a17efef-2658-4539-afb3-d8d4919097c8_817x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Kafka vs. traditional message queues</strong></p><p>If you&#8217;ve worked with RabbitMQ, ActiveMQ, or AWS SQS, you might wonder: &#8220;<strong>Aren&#8217;t they all just message queues</strong>?&#8221; Not quite. Here&#8217;s the critical <strong>distinction</strong>: traditional queues delete a message once it&#8217;s consumed. Kafka retains it for a configurable retention period (days, weeks, or indefinitely), and any number of consumers can read the same message independently.</p><p>This enables entirely new patterns like event sourcing, audit logs, and replaying historical events into new microservices, impossible in a classical queue model.</p><blockquote><p><em><strong>Real-world use cases across industries</strong></em></p></blockquote><ol><li><p><strong>Real-time fraud detection</strong>: &#8212; Banks stream transaction events through Kafka to ML scoring services that flag anomalies within milliseconds of a swipe.</p></li><li><p><strong>Microservices decoupling</strong>: Services publish events instead of making direct API calls, removing tight coupling and enabling independent deployment.</p></li><li><p><strong>Log aggregation</strong>: Application logs from thousands of servers are funnelled into Kafka and forwarded to Elasticsearch or S3 for centralized analysis.</p></li><li><p><strong>Change Data Capture</strong>: Kafka Connect with Debezium captures every database row change and propagates it to downstream consumers in real time.</p></li><li><p><strong>IoT data ingestion:</strong> Billions of device telemetry points per day like smart meters, factory sensors, connected cars, land in Kafka before analytics.</p></li><li><p><strong>Stream processing:</strong> Kafka Streams and ksqlDB allow stateful computations, aggregations, joins, windowing directly on the data stream.</p></li></ol><p><strong>Kafka Streams vs ksqlDB &#8212; which should you use?</strong></p><p>Both tools let you process events in motion, but they serve different audiences. <strong>Kafka Streams</strong> is a Java/Scala library embedded in your application, perfect for developers who want full programmatic control. <strong>ksqlDB</strong> exposes a SQL-like interface for defining stream transformations declaratively, making it accessible to data analysts and teams that prefer SQL over Java.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dRYu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dRYu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 424w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 848w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 1272w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dRYu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png" width="817" height="251" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:251,&quot;width&quot;:817,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dRYu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 424w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 848w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 1272w, https://substackcdn.com/image/fetch/$s_!dRYu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66841c-7a40-4965-84ee-514d46a65ad0_817x251.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Kafka&#8217;s strengths and rough edges</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cwr4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cwr4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 424w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 848w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 1272w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cwr4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png" width="825" height="290" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:290,&quot;width&quot;:825,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!cwr4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 424w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 848w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 1272w, https://substackcdn.com/image/fetch/$s_!cwr4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad75e7-a452-4f68-888e-40cc0d1e7719_825x290.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>KRaft mode &#8212; Kafka without ZooKeeper</strong></p><p>One of the most significant architectural shifts in Kafka&#8217;s recent history is the removal of Apache ZooKeeper as an external dependency. The <strong>KRaft (Kafka Raft Metadata)</strong> mode, introduced experimentally in Kafka 2.8 and production-ready since 3.3, replaces ZooKeeper with a self-managed quorum of controller brokers using the Raft consensus algorithm.</p><p>This simplifies cluster operations dramatically, fewer moving parts, simpler deployment, better startup times, and support for millions of partitions per cluster (compared to tens of thousands in ZooKeeper mode). If you&#8217;re starting a new Kafka deployment today, KRaft is the way to go.</p><p><strong>Getting started: should you self-host or go managed?</strong></p><p>Self-hosting Kafka gives you complete control but demands real operational expertise like JVM tuning, replication configuration, disk management, and monitoring. For most teams, a managed offering is the pragmatic starting point:</p><p><strong>Confluent Cloud</strong> is the most feature-complete managed Kafka, built by Kafka&#8217;s original creators. <strong>Amazon MSK</strong> is tightly integrated with the AWS ecosystem. <strong>Redpanda</strong> is a Kafka-compatible alternative written in C++ that eliminates the JVM entirely and offers impressive performance on smaller hardware.</p><p><em><strong>Rule of thumb</strong>: if your team doesn&#8217;t yet have Kafka expertise, use a managed service. When throughput costs exceed $1,000/month or you need custom broker tuning, then evaluate self-hosting.</em></p><blockquote><p><em><strong>Kafka Streams</strong></em></p></blockquote><p>Kafka Streams is a client library built directly into Apache Kafka that lets you write real-time stream processing applications using plain Java/Scala, no separate cluster needed.</p><p><strong>The core idea:</strong> Your app reads from one or more Kafka topics, applies transformations (filter, map, aggregate, join), and writes results back to another Kafka topic. The processing happens continuously as records arrive.</p><p><strong>Pipeline for a real-time order analytics engine:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uUYu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uUYu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 424w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 848w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 1272w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uUYu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png" width="770" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:770,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!uUYu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 424w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 848w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 1272w, https://substackcdn.com/image/fetch/$s_!uUYu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc27407f0-28e5-4771-a9d4-4ebd2118ad4b_770x70.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now here&#8217;s the complete working Java implementation of that exact topology, with annotations explaining each step:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wVbj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wVbj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 424w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 848w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 1272w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wVbj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png" width="860" height="204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:204,&quot;width&quot;:860,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!wVbj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 424w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 848w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 1272w, https://substackcdn.com/image/fetch/$s_!wVbj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7640cfbf-c5ea-45b2-9e2a-9c6e3fc9349f_860x204.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-GWa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-GWa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 424w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 848w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 1272w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-GWa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png" width="811" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/686c00c6-6890-4fe8-b225-47400a299aae_811x655.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:811,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!-GWa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 424w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 848w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 1272w, https://substackcdn.com/image/fetch/$s_!-GWa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F686c00c6-6890-4fe8-b225-47400a299aae_811x655.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>KStream: Unbounded event log &#8212; </strong>Every record is an independent event. Same key can appear many times. Good for: transactions, clicks, logs.</p><p><strong>KTable: Changelog / latest value &#8212;</strong> Each key has exactly one current value. New records overwrite old. Good for: user profiles, inventory counts.</p><p><strong>Tumbling window:</strong> <strong>Fixed, non-overlapping buckets &#8212; </strong>Events fall into exactly one 1-min bucket. No overlap. Simple and predictable &#8212; start here.</p><p><strong>Sliding window: Rolling time range &#8212;</strong> A new window for every record. How many orders in the last 5 mins from now? Higher overhead, richer insight.</p><p><strong>KStream vs KTable: </strong>think of <code>KStream</code> as a river (events keep flowing, same key can appear again and again) and <code>KTable</code> as a database table (each key has one current value, new records overwrite old). When you aggregate a stream, you get a table.</p><p><strong>Windowing</strong> : without a window, aggregations run forever (total orders since the beginning of time). Windows slice time into buckets so you get meaningful metrics like &#8220;<strong>orders per minute</strong>&#8221; or &#8220;r<strong>evenue in the last hour</strong>.&#8221; Tumbling windows are non-overlapping and the easiest to start with.</p><p><strong>State stores: </strong>when Kafka Streams aggregates, it stores intermediate state in a local RocksDB store (on disk, fault-tolerant). This is what lets your app survive a restart without losing its counts mid-window.</p><p><strong>Repartitioning: </strong>when you call <code>selectKey()</code> to change the message key, Kafka Streams automatically writes to an internal repartition topic and re-reads from it so that all records with the same key land on the same partition (and therefore the same thread). This is required for correct aggregation.</p><p>A <strong>source topic</strong> in Kafka Streams is simply the Kafka topic your application reads from, it&#8217;s the entry point of your processing pipeline.</p><p>Kafka topics are durable, ordered logs of events. A source topic is any one of those logs that you designate as <em>input</em> to your stream processor. Your Kafka Streams app subscribes to it, and every record that lands there gets pulled into your topology for processing.</p><p>In the code from the previous example, this single line declares the source topic:</p><pre><code>KStream&lt;String, String&gt; orders = builder.stream(&#8221;order-events&#8221;);</code></pre><p>That&#8217;s it. <code>builder.stream()</code> tells Kafka Streams: &#8220;<strong>watch the topic called</strong> <code>order-events</code>, and hand me every record as a <code>KStream</code> I can transform.&#8221; From that point, records flow through your <strong>filter &#8594; map &#8594; aggregate chain.</strong></p><p><em><strong>A few things worth knowing about source topics:</strong></em></p><p>You can have more than one. <code>builder.stream(List.of("orders-us", "orders-eu"))</code> merges multiple topics into a single stream automatically.</p><p>The source topic must already exist in Kafka before your app starts, Kafka Streams won&#8217;t create it for you (though it does create internal repartition and changelog topics automatically).</p><p>Your app tracks its position in the source topic using <strong>consumer offsets</strong> &#8212; the same mechanism as a regular Kafka consumer. If your app restarts, it picks up from where it left off rather than reprocessing everything from the beginning.</p><p>The source topic is also where <strong>replay</strong> becomes powerful: if you deploy a new version of your processing logic, you can reset the consumer offset back to the beginning of the topic and reprocess the entire history through your new topology, something impossible with traditional message queues that delete records after consumption.</p><p><code>filter</code> is one of the most fundamental operations in Kafka Streams. It lets you selectively pass records downstream or any record where the condition returns <code>false</code> is silently dropped and never written anywhere.</p><p>The mental model is simple: imagine a bouncer at the door of your pipeline. Every record walks up, the predicate is evaluated, and it either gets let through or turned away.</p><pre><code>KStream&lt;String, String&gt; highValue = orders
    .filter((key, value) -&gt; 
{
        double amount = parseAmount(value);
        return amount &gt; 50.0;   // only orders over $50 pass
    }
);</code></pre><p>The lambda receives two arguments &#8212; the record&#8217;s <strong>key</strong> and its <strong>value</strong> &#8212; and <strong>must return a boolean</strong>. Records where it returns <code>true</code> flow to the next stage. Records where it returns <code>false</code> are gone from <em>this</em> stream permanently (though the original source topic is untouched &#8212; Kafka never mutates stored data).</p><p><strong>A closely related method: </strong><code>filterNot</code></p><p>It&#8217;s the logical inverse, keeps records where the predicate is <code>false</code>. These two are equivalent:</p><pre><code>stream.filter((k, v) -&gt; !isInvalid(v));
stream.filterNot((k, v) -&gt; isInvalid(v));</code></pre><p>Use whichever reads more naturally for your condition.</p><p><strong>Branching &#8212; when one filter isn&#8217;t enough</strong></p><p>If you need to route records to <em>different</em> downstream paths based on conditions, <code>filter</code> chained multiple times works but reads awkwardly. The cleaner approach is <code>split().branch()</code>:</p><pre><code>Map&lt;String, KStream&lt;String, String&gt;&gt; branches = orders
    .split(Named.as(&#8221;tier-&#8221;))
    .branch((k, v) -&gt; amount(v) &gt; 500,  Named.as(&#8221;premium&#8221;))
    .branch((k, v) -&gt; amount(v) &gt; 50,   Named.as(&#8221;standard&#8221;))
    .defaultBranch(                      Named.as(&#8221;low-value&#8221;));

KStream&lt;String, String&gt; premium  = branches.get(&#8221;tier-premium&#8221;);
KStream&lt;String, String&gt; standard = branches.get(&#8221;tier-standard&#8221;);</code></pre><p>Each record is evaluated top-to-bottom and sent to the first branch it matches, like a switch statement for your stream.</p><p><strong>One important caution:</strong> <code>filter</code> is stateless. It evaluates each record in complete isolation with no memory of what came before. If your condition needs to know about previous records (e.g. &#8220;flag this user&#8217;s third order in a row&#8221;), you need a stateful operation like <code>aggregate</code> or a <code>Transformer</code> with a state store instead.</p><p><code>mapValues</code></p><p>The key staying the same matters more than it might seem , because Kafka Streams uses the key to determine which partition a record belongs to. Since <code>mapValues</code> never touches the key, it never triggers a repartition, making it the most efficient transformation in the API.</p><pre><code>KStream&lt;String, Order&gt; enriched = rawOrders
    .mapValues(value -&gt;
 {
        Order order = parse(value);
        order.setRegion(lookupRegion(order.getCustomerId()));
        order.setTax(order.getAmount() * 0.18);
        order.setProcessedAt(Instant.now());
        return order;
    }
);</code></pre><p>The lambda receives the record&#8217;s value and returns a new value, which can be a completely different type. The key is never even passed in.</p><p><strong>When you need the key inside the transformation, use </strong><code>map</code><strong> instead:</strong></p><pre><code>// mapValues &#8212; key invisible, no repartition
stream.mapValues((value) -&gt; transform(value));

// map - key + value both available, triggers repartition
stream.map((key, value) -&gt; new KeyValue&lt;&gt;(newKey, transform(value)));</code></pre><p>Prefer <code>mapValues</code> whenever you can. Repartition means Kafka Streams writes every record to an internal topic and reads it back, real network and disk overhead. If your transformation doesn&#8217;t need to change the key, there&#8217;s no reason to pay that cost.</p><p><strong>You can also access the key in </strong><code>mapValues</code><strong> without triggering repartition</strong> using the two-argument form:</p><pre><code>stream.mapValues((key, value) -&gt; {
    // key is readable but cannot be changed here
    return enrich(key, value);
}
);</code></pre><p>you can <em>read</em> the key to inform your transformation (say, to look something up by customer ID) without the cost of actually changing it.</p><p><strong>The full family of map operations, compared:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dQhj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dQhj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 424w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 848w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 1272w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dQhj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png" width="966" height="298" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:298,&quot;width&quot;:966,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dQhj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 424w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 848w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 1272w, https://substackcdn.com/image/fetch/$s_!dQhj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd295b42c-ef8b-432e-814b-7f65051f7a39_966x298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><code>peek</code> is worth a mention, it&#8217;s <code>mapValues</code> but returns the value unchanged. Useful for logging or debugging mid-pipeline without affecting the stream:</p><pre><code>stream
    .peek((key, value) -&gt; log.info(&#8221;Processing: {}&#8221;, key))
    .filter(...)
    .mapValues(...);</code></pre><p><em><strong>Windowed aggregation</strong></em></p><p>Without windowing, an aggregation would accumulate forever . your count would just grow from the beginning of time with no useful segmentation. Windows give results meaning by bounding them in time.<strong>The key intuition</strong> is this: a stream is infinite, but your questions about it aren&#8217;t. &#8220;How much revenue in the last minute?&#8221; has a finite answer: you just need to define <em>which minute</em>. That&#8217;s what windows do. They draw a boundary around time so aggregations have somewhere to begin and end.</p><p>Here&#8217;s how the three main window types carve up the same stream of events differently:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TdHy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TdHy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 424w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 848w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 1272w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TdHy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png" width="747" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:747,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TdHy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 424w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 848w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 1272w, https://substackcdn.com/image/fetch/$s_!TdHy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22ea6195-1784-4b3a-bbd2-d5a072795b00_747x392.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now here&#8217;s the complete interactive breakdown of how each window type works in code, with a live simulator showing how events fall into buckets:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9OhV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9OhV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 424w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 848w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 1272w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9OhV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png" width="865" height="410" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:865,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!9OhV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 424w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 848w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 1272w, https://substackcdn.com/image/fetch/$s_!9OhV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70627e0a-37ea-4566-b3f4-14d38694fb03_865x410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qOwl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qOwl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 424w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 848w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 1272w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qOwl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png" width="828" height="227" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41575888-7398-4eea-993a-48c46d44c256_828x227.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:227,&quot;width&quot;:828,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qOwl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 424w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 848w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 1272w, https://substackcdn.com/image/fetch/$s_!qOwl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41575888-7398-4eea-993a-48c46d44c256_828x227.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Each event lands in exactly one bucket. No event is ever double-counted. When a window closes, its result is final.</p><p><strong>Tumbling windows</strong> are the simplest and most memory-efficient. Every event belongs to exactly one bucket, results are final when the window closes, and state stores stay small. Start here unless you have a specific reason not to.</p><p><strong>Hopping windows</strong> give smoother rolling metrics, because windows overlap, your counts change gradually rather than resetting sharply every minute. The trade-off is memory: each event is stored in multiple windows simultaneously, so state stores grow proportionally to <code>window_size / hop_interval</code><strong>.</strong></p><p><strong>Session windows</strong> are fundamentally different . They&#8217;re <strong>driven by</strong> <em>behaviour</em>, not the clock. The window size is unknown in advance and varies per key. They&#8217;re ideal for modelling real user sessions, but require more careful tuning of the inactivity gap.</p><p><strong>One important subtlety &#8212; grace periods.</strong> All the examples above use <code>withNoGrace</code>, which means a late-arriving event (one whose timestamp is earlier than the current stream time) is simply dropped. In production you almost always want to add a grace period:</p><pre><code>TimeWindows.ofSizeAndGrace(
    Duration.ofMinutes(1),   // window size
    Duration.ofSeconds(10)   // accept events up to 10s late
)</code></pre><p>This tells Kafka Streams to keep the window open a little longer to accommodate events that arrive slightly out of order, common in distributed systems where network delays are real.</p><p><strong>Under the hood</strong>, all windowed aggregation state lives in a local RocksDB store on disk, backed by a Kafka changelog topic. If your app restarts, it replays the changelog to restore its state exactly. So your counts survive crashes without reprocessing the entire source topic from the beginning.</p><p><strong>A sink topic is the mirror image of a source topic. </strong>it&#8217;s the Kafka topic your Kafka Streams application <em>writes results to</em> at the end of your processing pipeline.</p><p>If the source topic is where raw events come in, the sink topic is where processed, enriched, or aggregated results go out. Other applications &#8212; dashboards, databases, microservices, or even another Kafka Streams app can then consume from it.</p><p>In the pipeline we&#8217;ve been building throughout this conversation, this single line declares the sink:</p><pre><code>stats.toStream()
     .map((windowedKey, value) -&gt; new KeyValue&lt;&gt;(windowedKey.key(), value))
     .to(&#8221;order-stats&#8221;);   // &#8592; this is the sink topic</code></pre><p><code>.to()</code> is the terminal operation. It ends the topology and writes every record to <code>order-stats</code>. Nothing flows further downstream within this app.</p><p><em><strong>Sink topics</strong></em></p><p>You can write to multiple sink topics from one topology. <code>.branch()</code> lets you route different records to different destinationsm, premium orders to one topic, standard orders to another, all within the same app.</p><p>This is how you chain Kafka Streams applications into larger pipelines: App A writes enriched orders to <code>enriched-orders</code>, App B reads <code>enriched-orders</code> and writes fraud scores to <code>fraud-scores</code>, and so on. Each app is independently deployable and scalable.</p><p>You can also use <code>.through()</code> instead of <code>.to()</code> when you want to write to an intermediate topic <em>and</em> keep processing. <code>.to()</code> is a dead end; <code>.through()</code> writes and hands you back a new <code>KStream</code> to continue working with.</p><p><strong>The sink topic is also how Kafka Streams integrates with the rest of your stack.</strong> From there, Kafka Connect can pick up the results and push them into Elasticsearch, PostgreSQL, S3, or any other downstream system, without your Streams app needing to know anything about those targets directly.</p><p><em><strong>Kafka Connect</strong></em></p><p>Kafka Connect is the data integration layer of the Kafka ecosystem. It moves data <em>into</em> and <em>out of</em> Kafka without you writing any producer or consumer code. It sits between your sink topics and the outside world.</p><p><strong>The core idea is simple:</strong> instead of writing bespoke code to push your <code>order-stats</code> topic into PostgreSQL, you configure a <strong>connector, </strong>a plugin that knows how to talk to a specific system and Connect handles the rest. Polling, batching, retries, offset tracking, scaling, all managed for you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tXUj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tXUj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 424w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 848w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 1272w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tXUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png" width="962" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:962,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tXUj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 424w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 848w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 1272w, https://substackcdn.com/image/fetch/$s_!tXUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b075d50-0526-4d5f-896b-2a860a26d023_962x346.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Connect has two connector directions. <strong>Source connectors</strong> pull data <em>into</em> Kafka from external systems , a database, a REST API, a file system. <strong>Sink connectors</strong> push data <em>out of</em> Kafka to external targets, Elasticsearch, S3, a data warehouse. Your Kafka Streams sink topic plugs directly into a sink connector.</p><p><strong>Configuration, not code</strong></p><p>The most important thing about Connect is that you configure connectors with JSON, you rarely write code. Here&#8217;s a complete sink connector that takes your <code>order-stats</code> topic and writes every record into Elasticsearch:</p><pre><code>{
  &#8220;name&#8221;: &#8220;order-stats-elastic-sink&#8221;,
  &#8220;config&#8221;: {
    &#8220;connector.class&#8221;: &#8220;io.confluent.connect.elasticsearch.ElasticsearchSinkConnector&#8221;,
    &#8220;tasks.max&#8221;: &#8220;2&#8221;,
    &#8220;topics&#8221;: &#8220;order-stats&#8221;,
    &#8220;connection.url&#8221;: &#8220;http://localhost:9200&#8221;,
    &#8220;type.name&#8221;: &#8220;_doc&#8221;,
    &#8220;key.ignore&#8221;: &#8220;false&#8221;,
    &#8220;schema.ignore&#8221;: &#8220;true&#8221;
  }
}</code></pre><p>You POST that JSON to Connect&#8217;s REST API and it starts flowing. No Kafka consumer code, no Elasticsearch client code, Connect handles it all.</p><p><strong>Tasks and workers</strong></p><p>Under the hood, Connect runs on a cluster of <strong>workers</strong> (JVM processes). <strong>Each connector</strong> is broken into one or more <strong>tasks, </strong>the unit of parallelism. Setting <code>tasks.max: 2</code> above means Connect will run two parallel tasks pulling from <code>order-stats</code>, each handling a subset of the topic&#8217;s partitions. Scale out by adding more workers; Connect rebalances tasks automatically.</p><p><em><strong>Debezium &#8212; the source connector worth knowing by name</strong></em></p><p>The most widely used source connector is Debezium, which implements <strong>Change Data Capture (CDC)</strong>. Instead of polling a database with <code>SELECT</code> queries, Debezium tails the database&#8217;s binary replication log, the same stream your read replicas use and emits every insert, update, and delete as a Kafka event in real time.</p><pre><code>{
  &#8220;name&#8221;: &#8220;postgres-cdc-source&#8221;,
  &#8220;config&#8221;: {
    &#8220;connector.class&#8221;: &#8220;io.debezium.connector.postgresql.PostgresConnector&#8221;,
    &#8220;database.hostname&#8221;: &#8220;localhost&#8221;,
    &#8220;database.port&#8221;: &#8220;5432&#8221;,
    &#8220;database.user&#8221;: &#8220;debezium&#8221;,
    &#8220;database.password&#8221;: &#8220;secret&#8221;,
    &#8220;database.dbname&#8221;: &#8220;orders&#8221;,
    &#8220;table.include.list&#8221;: &#8220;public.orders&#8221;,
    &#8220;topic.prefix&#8221;: &#8220;cdc&#8221;
  }
}</code></pre><p>This emits every change to the <code>orders</code> table as an event on the topic <code>cdc.public.orders</code> &#8212; with the full before and after state of the row. Your Kafka Streams app can then consume that topic, joining CDC events with other streams in real time.</p><p><em><strong>How it ties the whole pipeline together</strong></em></p><p>Putting it all together, the full architecture we&#8217;ve built across this conversation looks like this:</p><pre><code>PostgreSQL (orders table)
  &#8594; Debezium source connector
    &#8594; order-events topic          &#8592; source topic
      &#8594; Kafka Streams app
        (filter &#8594; mapValues &#8594; aggregate)
          &#8594; order-stats topic     &#8592; sink topic
            &#8594; Elasticsearch sink connector
              &#8594; Elasticsearch (for dashboards)
              &#8594; S3 sink connector
                &#8594; S3 (for long-term storage)</code></pre><p>Every stage is independently scalable, fault-tolerant, and loosely coupled. The Kafka topics act as durable buffers between each layer, if Elasticsearch goes down, records queue up in the sink topic and drain when it recovers, with zero data loss.</p><p>Want to go deeper on Debezium and CDC, or explore how Schema Registry keeps the data contracts between all these components consistent?</p><p><em>Apache Kafka Streams is more than just a processing library. It&#8217;s a complete rethinking of how applications relate to data in motion.</em></p><p>What started as a deep dive into a single blog post has taken us through the full anatomy of a production-grade streaming pipeline: source topics feeding raw events in, <code>filter</code> and <code>mapValues</code> shaping them as they flow, windowed aggregation turning an infinite stream into meaningful time-bounded metrics, a sink topic carrying results out, and Kafka Connect bridging the gap to every system downstream.</p><p>The design philosophy running through all of it is the same: <strong>loose coupling through durable, replayable logs</strong>. Each component, your Streams app, your Connect connectors, your downstream consumers, can fail, restart, scale, and evolve independently, because Kafka topics act as the resilient buffer between them. That&#8217;s not an accident of implementation; it&#8217;s the architectural bet Kafka makes, and it pays off at scale in ways that tightly coupled systems simply can&#8217;t match.</p><p><strong>If you&#8217;re building on this for the first time</strong>, a practical path forward looks like this: start with a managed Kafka cluster (Confluent Cloud or Amazon MSK) to skip the operational overhead, wire up a simple <code>filter</code><strong> &#8594; </strong><code>to()</code> topology to get comfortable with the Streams DSL, then layer in windowed aggregation once your use case demands time-bounded metrics. Add Debezium when you need to react to database changes, and Connect sinks when results need to land in Elasticsearch, S3, or a data warehouse.</p><p>The ecosystem around Kafka Schema Registry for data contracts, ksqlDB for SQL-native stream processing, Kafka Connect&#8217;s library of 200+ connectors, means you rarely have to solve the integration problem from scratch. The primitives are there. The patterns are proven. The only question left is what you build with them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://devopsguyankit.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DevOps IN SPACE! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kafka Consumer Lag Explained]]></title><description><![CDATA[A engineer&#8217;s guide &#8212; from how offsets really work, through a live e-commerce incident, to a Prometheus alerting playbook and a troubleshooting decision tree.]]></description><link>https://devopsguyankit.substack.com/p/kafka-consumer-lag-explained</link><guid isPermaLink="false">https://devopsguyankit.substack.com/p/kafka-consumer-lag-explained</guid><dc:creator><![CDATA[Ankit Ranjan]]></dc:creator><pubDate>Tue, 14 Apr 2026 06:25:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZzVD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Kafka Internals Primer</strong></p><p>Before we can understand consumer lag, you need a firm mental model of how Kafka stores and delivers data. Most lag problems are caused by engineers who skipped this section.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://devopsguyankit.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DevOps IN SPACE! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A Kafka topic is a named stream. Internally it&#8217;s divided into one or more partitions &#8212; each an immutable, ordered, append-only log living on disk. Producers write to partitions; consumers read from them. The key insight: within a partition, order is guaranteed. Across partitions, it is not.</p><p>Every message written to a partition is assigned a sequential 64-bit integer called an offset &#8212; starting at 0 and incrementing by 1 per message. This offset is the fundamental unit of position tracking in Kafka. It never changes after assignment.</p><p><strong>MENTAL MODEL</strong></p><p>Think of each partition as a numbered ticket roll. Producers tear off new tickets at the front; consumers work through the roll from their last seen number. The gap between &#8216;latest ticket issued&#8217; and &#8216;last ticket processed by group X&#8217; is the lag.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JazU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JazU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 424w, https://substackcdn.com/image/fetch/$s_!JazU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 848w, https://substackcdn.com/image/fetch/$s_!JazU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 1272w, https://substackcdn.com/image/fetch/$s_!JazU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JazU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png" width="642" height="289" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:289,&quot;width&quot;:642,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!JazU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 424w, https://substackcdn.com/image/fetch/$s_!JazU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 848w, https://substackcdn.com/image/fetch/$s_!JazU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 1272w, https://substackcdn.com/image/fetch/$s_!JazU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a46bc35-859d-47a3-bb09-d698875a8509_642x289.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Fig 1 &#8212; A single Kafka topic with 3 partitions showing per-partition lag accumulation. Total group lag is the sum.</em></figcaption></figure></div><p><strong>What Consumer Lag Actually Is</strong></p><p>Consumer lag is the delta between where the producer has written to and where the consumer group has committed to, per partition:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ls29!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ls29!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 424w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 848w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 1272w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ls29!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png" width="939" height="103" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:103,&quot;width&quot;:939,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Ls29!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 424w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 848w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 1272w, https://substackcdn.com/image/fetch/$s_!Ls29!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F968dce78-1387-45ae-bfec-6dbde5ddb0d9_939x103.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Lag of <code>0</code> means the consumer is caught up. Lag of <code>50,000</code> means 50,000 messages are queued but unprocessed. Lag is always per-partition, per-consumer-group &#8212; the same topic can have zero lag for group A and 100,000 lag for group B simultaneously.</p><blockquote><p><em><strong>CRITICAL MISCONCEPTION</strong></em></p></blockquote><p>Lag is NOT the same as latency. A lag of 10,000 messages might represent 2 seconds if messages are tiny and consumers are fast, or 10 minutes if messages are heavy. Always correlate lag with your consumer&#8217;s throughput rate when assessing severity.</p><p><strong>The Offset Lifecycle</strong></p><p>Understanding where offsets live and how they move is the key to diagnosing lag. This is the architecture most engineers don&#8217;t fully internalize</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZzVD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZzVD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 424w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 848w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 1272w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZzVD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png" width="769" height="437" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:437,&quot;width&quot;:769,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZzVD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 424w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 848w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 1272w, https://substackcdn.com/image/fetch/$s_!ZzVD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f802b88-5952-4db7-a13c-cdcbcafb35dc_769x437.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Fig 2 &#8212; The full Kafka offset lifecycle. Lag accumulates in the gap between the partition&#8217;s log end offset and the consumer group&#8217;s committed offset stored in </em><code>__consumer_offsets</code></p><p><strong>Auto-commit vs Manual Commit</strong></p><p>This distinction matters enormously for lag accuracy. With enable.auto.commit=true (the default), the consumer commits on a timer (every auto.commit.interval.ms = 5 seconds by default).</p><blockquote><p><em><strong>This means:</strong></em></p></blockquote><p>&#8226; Lag reported in Prometheus may appear smaller than reality between commit intervals</p><ul><li><p>If the consumer crashes after processing but before the auto-commit timer fires, messages are reprocessed (at-least-once semantics)</p></li><li><p>Disabling auto-commit and using commitSync() or commitAsync() after processing gives you exactly-once control but adds latency per batch</p></li></ul><blockquote><p><em><strong>Real-Time Scenario &#8212; Black Friday Meltdown at ShopFlow</strong></em></p></blockquote><p>Let&#8217;s make this concrete with a real incident. ShopFlow is a mid-size e-commerce platform processing order events through Kafka. Their architecture has three consumer groups reading the same topic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nenc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nenc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 424w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 848w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nenc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png" width="775" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cf35199-5074-461f-ba04-eda2d823443f_775x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:775,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Nenc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 424w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 848w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Nenc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cf35199-5074-461f-ba04-eda2d823443f_775x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Fig 3 &#8212; ShopFlow&#8217;s e-commerce Kafka architecture. Three consumer groups read the same topic. The fraud-group is lagging critically while inventory-group runs fine &#8212; proving that lag is per-group, not per-topic.</em></p><blockquote><p><em><strong>Incident &#8212; Black Friday 2024, 09:14 UTC</strong></em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sle2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sle2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 424w, https://substackcdn.com/image/fetch/$s_!sle2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 848w, https://substackcdn.com/image/fetch/$s_!sle2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 1272w, https://substackcdn.com/image/fetch/$s_!sle2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sle2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png" width="937" height="373" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:373,&quot;width&quot;:937,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!sle2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 424w, https://substackcdn.com/image/fetch/$s_!sle2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 848w, https://substackcdn.com/image/fetch/$s_!sle2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 1272w, https://substackcdn.com/image/fetch/$s_!sle2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba367a-34ff-4472-8e0f-cb551c8a573a_937x373.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What went wrong (root cause analysis)</strong></p><p>The fraud consumer was making a synchronous HTTP call to a third-party Fraud API inside the poll loop. Under normal load (200 orders/min) the API responded in ~80ms. Under Black Friday load, it degraded to 30,000ms timeouts. A single slow downstream API cascaded into 48,000 messages of lag because each consumer thread was blocked waiting for the HTTP response before committing the offset.</p><blockquote><p><em><strong>LESSON LEARNED</strong></em></p></blockquote><p>Never make synchronous external HTTP calls inside a Kafka consumer&#8217;s poll loop without a circuit breaker and a tight timeout. The poll loop must complete within max.poll.interval.ms or the broker considers the consumer dead and triggers a rebalance &#8212; which makes lag worse.</p><p><strong>Root Causes</strong></p><p>Let&#8217;s systematically walk through every category of lag cause, with the architectural pattern that causes each one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oiAB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oiAB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 424w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 848w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 1272w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oiAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png" width="761" height="430" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfc21869-013e-4347-a090-2b202d22bf10_761x430.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:430,&quot;width&quot;:761,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!oiAB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 424w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 848w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 1272w, https://substackcdn.com/image/fetch/$s_!oiAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc21869-013e-4347-a090-2b202d22bf10_761x430.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Fig 4 &#8212; Complete root cause taxonomy for Kafka consumer lag, organized by where in the pipeline the problem originates.</em></p><p>Every consumer lag problem falls into one of three buckets: Producer-side, Broker-side, or Consumer-side. Understanding the category determines the fix.</p><p><strong>Producer-Side Causes</strong></p><p>&#8226; Traffic spike &#8212; producer rate suddenly exceeds consumer capacity (e.g. Black Friday flash sales)</p><p>&#8226; Too few partitions &#8212; max parallelism capped below consumer count, creating idle threads</p><p><strong>Broker-Side Causes</strong></p><p>&#8226; Partition hot-spotting &#8212; a bad partition key causes one partition to receive 10x more traffic than others</p><p>&#8226; ISR shrinkage &#8212; under-replicated partitions cause fetch delays and increased latency</p><p><strong>Consumer-Side Causes</strong></p><p>&#8226; Slow processing &#8212; synchronous DB calls, HTTP requests, or heavy CPU work per message blocks the poll loop</p><p>&#8226; Rebalance storm &#8212; GC pauses, pod OOMs, or session.timeout.ms set too low repeatedly kicks consumers out of the group</p><p>&#8226; Poison pill messages &#8212; schema mismatches or malformed data block offset commits for the entire partition</p><p>&#8226; Under-scaled consumers &#8212; fixed pool with no HPA/KEDA can&#8217;t handle load bursts</p><p>&#8226; Long JVM GC pauses &#8212; stop-the-world pauses exceeding session.timeout.ms cause the broker to evict the consumer</p><p><strong>Monitoring Stack &#8212; Prometheus + Grafana + Alertmanager</strong></p><blockquote><p><em><strong>Architecture Overview</strong></em></p></blockquote><p>The standard observability stack for Kafka lag uses three components: <code>kafka-exporter</code> (or <code>Burrow</code>) to expose metrics, Prometheus to scrape and store them, and Grafana to visualize and alert.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dAD8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dAD8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 424w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 848w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 1272w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dAD8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png" width="720" height="218" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:218,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dAD8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 424w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 848w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 1272w, https://substackcdn.com/image/fetch/$s_!dAD8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84dfa2bc-2e93-4166-b41a-cf0361a5efb6_720x218.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Fig 5 &#8212; Kafka monitoring architecture. kafka-exporter bridges JMX to Prometheus exposition format, enabling full lag visibility in Grafana.</em></p><p><strong>Full Deployment Config</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DrLU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DrLU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 424w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 848w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 1272w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DrLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png" width="681" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:681,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!DrLU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 424w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 848w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 1272w, https://substackcdn.com/image/fetch/$s_!DrLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa841a2-90e2-455b-acb7-ce677ffdf7d5_681x622.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oKT7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oKT7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 424w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 848w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 1272w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oKT7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png" width="681" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:681,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!oKT7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 424w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 848w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 1272w, https://substackcdn.com/image/fetch/$s_!oKT7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1cb6f05-1486-4aed-991e-8865a0841f60_681x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><em><strong>The five Prometheus queries you must have</strong></em></p></blockquote><pre><code># 1. Total lag per consumer group (your primary dashboard panel)
sum(kafka_consumergroup_lag) by (consumergroup, topic)
# 2. Lag growth rate &#8212; positive = falling behind, negative = catching up
deriv(sum(kafka_consumergroup_lag) by (consumergroup)[10m])
# 3. Consumer throughput (messages processed per second)
rate(kafka_consumergroup_current_offset[2m])
# 4. Hotspot detection - find the partition with the most lag
topk(5, kafka_consumergroup_lag)
# 5. Time-to-drain estimate (minutes to clear backlog at current rate)
sum(kafka_consumergroup_lag) by (consumergroup) /
  (rate(kafka_consumergroup_current_offset[5m]) * 60)</code></pre><blockquote><p><em><strong>Tiered alerting rules</strong></em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z23E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z23E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 424w, https://substackcdn.com/image/fetch/$s_!z23E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 848w, https://substackcdn.com/image/fetch/$s_!z23E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 1272w, https://substackcdn.com/image/fetch/$s_!z23E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z23E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png" width="908" height="555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:555,&quot;width&quot;:908,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!z23E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 424w, https://substackcdn.com/image/fetch/$s_!z23E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 848w, https://substackcdn.com/image/fetch/$s_!z23E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 1272w, https://substackcdn.com/image/fetch/$s_!z23E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552a91ae-8367-4c39-9f96-8f6f7b8d2019_908x555.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mLLo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mLLo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 424w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 848w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 1272w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mLLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png" width="818" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:818,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!mLLo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 424w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 848w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 1272w, https://substackcdn.com/image/fetch/$s_!mLLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714caaf9-3758-4ec4-868e-132fbad5bad8_818x621.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Fixing lag in production</strong></p><p>These fixes are ordered by speed of impact. Always diagnose before scaling &#8212; adding consumers to fix a poison-pill problem does nothing.</p><p>1 Identify the bottleneck &#8212; profile before acting</p><p>Run <code>kafka-consumer-groups.sh --describe</code> to see per-partition lag. Check <code>deriv(lag)</code> in Prometheus. Is it one hot partition? All partitions? Check consumer CPU, memory, and downstream dependency latency (DB, API response times) before any remediation.</p><p>2 Scale out consumer instances (up to partition count)</p><p>In Kubernetes: <code>kubectl scale deployment fraud-consumer --replicas=6</code>. You can&#8217;t exceed the partition count &#8212; idle consumers above that threshold get no messages. For event-driven autoscaling, use KEDA with a Kafka trigger on <code>kafka_consumergroup_lag</code>.</p><p>3 Increase partition count to enable more parallelism</p><p>Use <code>kafka-topics.sh --alter --partitions 12</code>. Note: this increases only (never decrease), triggers a rebalance, and may break key-based ordering guarantees. Plan for it &#8212; don&#8217;t do it during active incidents unless you&#8217;ve done it before.</p><p>4 Tune consumer fetch configuration</p><p>Increase batch sizes: <code>max.poll.records=1000</code> (default 500), <code>fetch.min.bytes=50000</code>, <code>fetch.max.wait.ms=500</code>. This reduces round-trips and amortizes processing overhead across larger batches. Test with load before deploying.</p><p>5 Fix the rebalance storm &#8212; switch to CooperativeStickyAssignor</p><p>The default <code>RangeAssignor</code> uses eager rebalancing &#8212; all partitions are revoked and reassigned during any membership change. Switch to <code>CooperativeStickyAssignor</code> for incremental rebalancing where only migrating partitions are paused. Also increase <code>session.timeout.ms=45000</code> and <code>heartbeat.interval.ms=15000</code>.</p><p>6 Async-ify your processing logic</p><p>Move blocking I/O (DB writes, HTTP calls) out of the poll loop. Use async HTTP clients with circuit breakers (Resilience4j, Polly). Batch DB operations &#8212; instead of one <code>INSERT</code> per message, buffer 100 messages and bulk-insert. This can yield 10-50x throughput improvement.</p><p>7 Implement a Dead Letter Queue (DLQ) for poison pills</p><p>Deserialisation failures and schema mismatches block offset commits for the entire partition. Route failed messages to a <code>orders-DLQ</code> topic after N retries. This unblocks the consumer immediately. Process the DLQ separately with human review or a retry mechanism.</p><p>8 Emergency &#8212; reset offsets to skip ahead (data loss risk)</p><p>Only if SLA breach is imminent and you accept missing messages: <code>kafka-consumer-groups.sh --reset-offsets --to-latest --execute --group fraud-group --topic orders</code>. This skips all backed-up messages. Use <code>--to-datetime</code> for surgical resets to a specific timestamp instead of hard-latest</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xzxl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xzxl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 424w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 848w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 1272w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xzxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png" width="817" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:817,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Xzxl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 424w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 848w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 1272w, https://substackcdn.com/image/fetch/$s_!Xzxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16e1e4bd-4d4f-40eb-8408-4ce79ee7f937_817x498.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><em><strong>Troubleshooting decision tree</strong></em></p></blockquote><p>When an alert fires, use this decision tree to move from symptom to resolution in under 10 minutes.</p><p><strong>Symptom:</strong> <strong>Lag spikes on ALL partitions simultaneously &#8212;</strong></p><p><strong>Diagnosis + fix: </strong>Producer traffic spike or consumer group rebalance. Check:<code>deriv(lag)</code>vs<code>rate(produced)</code>.</p><p>If producer rate spike &#8594; scale consumers. If rebalance &#8594; check<code>kafka-consumer-groups.sh --describe</code>for consumer count drop.</p><p><strong>Symptom: Lag on ONE partition only, others fine</strong></p><p><strong>Diagnosis + fix: </strong>Partition hot-spot. Check producer key distribution &#8212; a single user_id or category producing 80% of traffic. Fix: use random partitioning or a composite key. Short-term: manually reassign the hot partition to a less-loaded broker.</p><p><strong>Symptom: Lag grows then suddenly resets to 0, repeats</strong></p><p><strong>Diagnosis + fix: </strong>Consumer is crashing in a loop. Check pod logs for OOM kills or exception stack traces. The consumer offset reset on restart catches up, then crashes again. Fix: increase memory limits, add DLQ for error handling, check for uncaught exception in poll loop.</p><p><strong>Symptom: Lag steady but non-zero (lag plateau)</strong></p><p><strong>Diagnosis + fix: </strong>Consumer throughput exactly matches producer rate but can&#8217;t drain backlog. Consumer is at max capacity. Add more consumers (up to partition count) or increase<code>max.poll.records</code>to process bigger batches per poll.</p><p><strong>Symptom: </strong>Lag grows during business hours, drains overnight</p><p><strong>Diagnosis + fix: </strong>Classic under-provisioning pattern. Scale consumers during peak hours using KEDA or a Kubernetes HPA with a Prometheus adapter. Consider pre-warming consumers 15 minutes before expected peak.</p><p><strong>Symptom: </strong>Consumer count = 0 in consumer group</p><p><strong>Diagnosis + fix: </strong>All consumers have left the group (crash, deploy, or network partition). The group still has committed offsets but no readers. Restart deployments, check pod status, verify Kafka connectivity. Lag will drain once consumers rejoin.</p><p><strong>Symptom: Lag exists but consumers report 0 messages processed</strong></p><p><strong>Diagnosis + fix: </strong>Poison-pill message at a committed offset is blocking deserialization. Look for<code>SerializationException</code>in consumer logs. Add try/catch around deserialize(), route failures to DLQ, and commit past the bad offset manually using<code>commitSync(Map&lt;partition, offset+1&gt;)</code>.</p><p><strong>Symptom: Lag offset falls behind earliest available offset</strong></p><p><strong>Diagnosis + fix: </strong>Data loss scenario. Kafka has deleted messages the consumer hasn&#8217;t read yet (retention elapsed). Set<code>auto.offset.reset=latest</code>to continue from now. Backfill from secondary source (S3, database) if you need the missed data. Prevention: extend retention, or ensure lag never exceeds retention window.</p><blockquote><p><em><strong>Quick-reference consumer configs</strong></em></p></blockquote><p><code>max.poll.records=1000</code> &#183; <code>max.poll.interval.ms=300000</code> &#183; <code>session.timeout.ms=45000</code> &#183; <code>heartbeat.interval.ms=15000</code> &#183; <code>fetch.min.bytes=50000</code> &#183; <code>fetch.max.wait.ms=500</code> &#183; <code>partition.assignment.strategy=CooperativeStickyAssignor</code> &#183; <code>enable.auto.commit=false</code></p><blockquote><p><em><strong>Key takeaways</strong></em></p></blockquote><ol><li><p>Lag is per-partition, per-group &#8212; not per-topic. The same topic can be healthy for one group and critical for another. Never average across groups &#8212; always monitor per-group, per-partition.</p></li><li><p>Alert on lag growth rate, not just absolute lag. <code>deriv(lag[10m]) &gt; 500</code> tells you the system is falling behind before the absolute number becomes catastrophic. This buys you 10-20 minutes of response time.</p></li><li><p>Partitions must be &gt;= max consumers you&#8217;ll ever need. Plan partition count at topic creation time. You can add partitions but not remove them. A topic with 3 partitions can never be consumed by more than 3 consumers in a group simultaneously.</p></li><li><p>Always implement a DLQ from day one. Poison-pill messages are a production reality. A single malformed message at offset N blocks every message after N in that partition forever, unless you skip it. Design for failure from the start.</p></li><li><p>Use CooperativeStickyAssignor for production groups. The default eager rebalancing stops all consumption for every membership change. Cooperative incremental rebalancing only pauses partitions that are actually moving. The switch is a config change, no code changes required.</p></li><li><p>Keep lag well inside your retention window. If your topic has 24h retention and your consumer falls 25h behind, messages are deleted before consumption. Monitor <code>time-to-drain</code> and ensure lag &#215; average message age never approaches your retention period.</p></li></ol><p>Consumer lag is not a Kafka problem, it&#8217;s a systems design signal. It tells you that your architecture has a throughput mismatch somewhere. Fix the mismatch, and the lag disappears. Build the observability to see the mismatch before your users do.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://devopsguyankit.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DevOps IN SPACE! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>