Site Reliability Engineering: How Google Runs Production Systems
In this case, Google has the opportunity to share its extensive experience in building and operating distributed systems, wherein it presents valuable SRE (site reliability engineering) principles. Wrapping up this, it includes assessment and response to incidents, capacity management, and automation that give you guidelines on how to monitor and maintain the reliability and scalability of your distributed systems.
Author: Niall Richard Murphy, Betsy Beyer, Chris Jones, and Jennifer Petoff
Top Books for Distributed System
The principles of distributed systems become more important to understand for engineers, developers, and architects. Fortunately, literature is just one of the places where this topic has been adequately covered. That is the reason why we have compiled a checklist of the top 10 books on distributed systems for you to use on this journey, which are full of interesting things to learn.
Top Books for Distributed System
- Designing Data-Intensive Applications
- Distributed Systems: Principles and Paradigms
- Distributed Algorithms
- Scalability Rules: 50 Principles for Scaling Web Sites
- Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
- Building Microservices
- Site Reliability Engineering: How Google Runs Production Systems
- Release It!: Design and Deploy Production-Ready Software
- Distributed Systems for Practitioners