Topologies
The flow of stream processing is represented by topologies, which are directed acyclic graphs, in Kafka Streams (“DAGs”).
- Stream: The most significant abstraction offered by Kafka Streams is a stream, which stands for an unbounded data set that is constantly changing. A data record is defined as a key-value pair, and a stream is an ordered, replayable, and fault-tolerant sequence of immutable data records.
- Stream Processor: A node in the processor topology known as a stream processor provides a processing step to convert data in streams by receiving one input record at a time from its upstream processors in the topology, processing it, and perhaps producing one or more output records for its downstream processors.
- Source Processor: A unique type of stream processor called a source processor doesn’t have any upstream processors. It consumes records from one or more Kafka topics and sends them to its downstream processors, producing an input stream to its topology from these topics.
- Sink Processor: A unique kind of stream processor called a sink processor lacks downstream processors. Any records it receives from its upstream processors are sent to a particular Kafka topic.
What is Apache Kafka Streams?
Kafka Streams is a library for processing and analyzing data stored in Kafka. It expands on crucial stream processing ideas such as clearly separating event time from processing time, allowing for windows, and managing and querying application information simply but effectively in real time. Kafka Streams has a low entry barrier since it is easy to create and operate a small-scale proof-of-concept on a single system. To scale up to high-volume production workloads, you merely need to run extra instances of your application on numerous machines. By utilizing Kafka’s parallelism paradigm, Kafka Streams transparently manages the load balancing of numerous instances of the same application.