Topologies

The flow of stream processing is represented by topologies, which are directed acyclic graphs, in Kafka Streams (“DAGs”).

Kafka Streams Topology

  • Stream: The most significant abstraction offered by Kafka Streams is a stream, which stands for an unbounded data set that is constantly changing. A data record is defined as a key-value pair, and a stream is an ordered, replayable, and fault-tolerant sequence of immutable data records.
  • Stream Processor: A node in the processor topology known as a stream processor provides a processing step to convert data in streams by receiving one input record at a time from its upstream processors in the topology, processing it, and perhaps producing one or more output records for its downstream processors.
  • Source Processor: A unique type of stream processor called a source processor doesn’t have any upstream processors. It consumes records from one or more Kafka topics and sends them to its downstream processors, producing an input stream to its topology from these topics.
  • Sink Processor: A unique kind of stream processor called a sink processor lacks downstream processors. Any records it receives from its upstream processors are sent to a particular Kafka topic.

What is Apache Kafka Streams?

Kafka Streams is a library for processing and analyzing data stored in Kafka. It expands on crucial stream processing ideas such as clearly separating event time from processing time, allowing for windows, and managing and querying application information simply but effectively in real time. Kafka Streams has a low entry barrier since it is easy to create and operate a small-scale proof-of-concept on a single system. To scale up to high-volume production workloads, you merely need to run extra instances of your application on numerous machines. By utilizing Kafka’s parallelism paradigm, Kafka Streams transparently manages the load balancing of numerous instances of the same application.

Similar Reads

Kafka Streams Architecture

Kafka Streams Architecture...

Features of Kafka Streams

Crafted to be a straightforward and lightweight client library that is simple to embed in any Java program and to work with any current packaging, deployment, and operational tools that customers may already have for their streaming applications. Uses Apache Kafka as the internal messaging layer and has no external dependencies on any other systems; in particular, it employs Kafka’s partitioning mechanism to horizontally scale processing while preserving strong ordering guarantees. Supports fault-tolerant local state, which makes stateful operations like windowed joins and aggregations incredibly quick and effective. Supports exactly-once processing semantics, which ensures that each record will only be processed once, even if Streams clients or Kafka brokers fail in the middle of the processing process. Achieves millisecond processing latency by using one record at a time processing. It also allows event-time-based windowing activities with data arriving out of sequence. Provides the essential stream processing primitives, a high-level Streams DSL, and both a low-level and a high-level Processor API....

Topologies

The flow of stream processing is represented by topologies, which are directed acyclic graphs, in Kafka Streams (“DAGs”)....

Duality of Streams and Tables

A table is a collection of key-value pairs....

Conclusion

Kafka Streams provide millisecond-level processing latency and are elastic, highly scalable, and fault-tolerant. Regardless of whether it runs on a VM, cloud, container, or on-premises, it functions precisely the same. Linux, Mac, and Windows are all supported systems. With so many advantages, its reach has grown in recent years....