Layers in Lambda Architecture

Lambda Architecture has mainly three layers to process big data:

  • Batch Layer (Cold process)
  • Stream Layer (Hot process or Speed Layer)
  • Serving layer

1. Batch layer

Batch Layer operates on the complete data and thus allows the system to produce the most accurate results. However, the results come at the cost of high latency due to high computation time.

The batch layer stores the raw data as it arrives and computes the batch views for consumption. Naturally, batch processes will occur at some interval and will be long-lived. The scope of data is anywhere from minutes to years.

2. Stream layer

Stream Layer operates on the real-time data to complement the batch views. It receives the arriving data from various clients and performs incremental updates to the batch layer results and store them in processed data Database.

This layer generates results in a low-latency, near real-time fashion. By implementing incremental algorithms(like insertion sort) at the Stream layer, the computation cost can be significantly reduced. The batch views may be processed with more complex or expensive rules and takes more time but has better data quality and less skew, while the real-time views processed simply by incoming traffic give you access to the latest possible data.

3. Serving Layer

Serving Layer is a server or a set of servers which processes output of various queries from different modules(like analytics module, Notification module) using the results sent from the batch and speed layers.

The outputs from the batch layer in the form of batch views and the speed layer in the form of near-real-time views are stored in the Processed Data DB as well as sent to serving layer, and this output is used by the serving layer to compute the queries on an ad-hoc basis and the database is used by the serving layer to compute the queries on premeditated basis.

What is Lambda architecture | System Design

This Architecture is widely used in many big tech companies as it takes advantage of both real-time data processing as well as batch processing i.e. one can query both fresh data by real-time data processing technique and historical data using batch processing data technique.

Important Topics for Lambda Architecture

  • What is Lambda Architecture?
  • Different Ways to Approach Lambda Architecture
  • Layers in Lambda Architecture
  • List of tools used in the Lambda Architecture:
  • Advantages of Lambda Architecture
  • Disadvantages of Lambda Architecture
  • Conclusion

Similar Reads

What is Lambda Architecture?

Lambda architecture is an excellent architecture for handling massive real-time data and building fault-tolerant, scalable systems....

Different Ways to Approach Lambda Architecture

There are two approaches to Lambda Architecture:...

Layers in Lambda Architecture

Lambda Architecture has mainly three layers to process big data:...

List of tools used in the Lambda Architecture

Apache Hadoop is used to store data and create distributed clusters. Hadoop Distributed File System (HDFS) is used for managing immutable data in the batch layer. Apache Spark is used for data streaming, graph processing, and data batch process. Apache Cassandra is used to store real-time views. Apache Kafka is used for data streaming in the speed layer. Apache Storm is used for the speed layer tasks. Apache HBASE is used for the serving layer tasks....

Advantages of Lambda Architecture

It is a good balance of speed, reliability, and scalability. The batch layer of Lambda architecture manages historical data with the fault-tolerant, distributed storage, ensuring a low possibility of errors even if the system crashes. The Stream layer of Lambda architecture manages the real time data with immediate response with somewhat less precision. Access to both real-time and offline data results in covering many data analysis scenarios very well....

Disadvantages of Lambda Architecture

Lambda architecture is complex infrastructure as it has many layers involved. Although the offline layer and the real-time stream face different scenarios, their internal processing logic is the same, so there are many duplicate modules and require different codebase. Maintaining the different code base and keeping them in sync so that processed data produces same results from both paths. Computes every batch cycle more then once, which decreases the system performance and requires more resources. A data set modeled with Lambda architecture is difficult to migrate or reorganize....

Conclusion

Lambda architecture is a flexible and powerful architecture. It is used by many tech companies to process the data they need to drive their most critical decisions and initiatives....