Once your architecture has grown to a certain scale, things will start to fail. If you don’t design for failure inside your system, one small error can bring your whole system down. Organizations like Netflix and Uber, but also Cloud Vendors like Amazon and Microsoft have shown us a new style of engineering; Chaos Engineering, that embraces failure.
With things like the Netflix Chaos Monkey and more recently the Netflix Simian Army different type of failures are induced to evaluate how the architecture handles these kind of issues. This has completely changed the way I think and design my systems. Luckily new patterns appear and some old patterns get a revival(like Circuit Breaker) that help us implement these principles.
More information:
- Chaos engineering at Microsoft Azure Search: https://azure.microsoft.com/en-us/blog/inside-azure-search-chaos-engineering/
- The Netflix Simian Army: http://techblog.netflix.com/2011/07/netflix-simian-army.html
- Principles of Chaos Engineering: http://www.principlesofchaos.org/