Context, problem, and forces

With cloud-native systems, we want to enable everyday companies and empower self-sufficient full-stack teams to rapidly, continuously, and confidently deliver these global scale systems. Following the reactive principles, these systems should be responsive, resilient, elastic, and message-driven. To this end, cloud-native systems are composed of many bounded isolated components that communicate via asynchronous messaging to increase responsiveness by delegating processing and achieve eventual consistency through the propagation of state change events. Stream processing has become the de facto standard in the cloud for implementing this message-driven inter-component communication.

To create responsive, resilient, isolated components our primary objective is to eliminate all synchronous inter-component communication. Synchronous, request, and response-based communication creates tightly coupled and brittle systems. The requesting component may not be coupled to the specific component that is servicing the request, but there must be some component available to respond to the request in a timely manner. Otherwise, at best, the request is fulfilled with high latency or, at worst, the request fails and the error propagates to the end user. Furthermore, the requester must discover a service instance that can fulfill the request. Service discovery adds significant complexity to systems based on synchronous communication.

Instead, we strive to limit synchronous communication to only intra-component communication with cloud-native databases and to the cloud-native streaming service. These are extremely high availability services with well-defined endpoints, which simplifies discovery and minimizes availability issues. With message-driven systems, location transparency between producers and consumers is accomplished via loosely coupled subscriptions. Through this mechanism, eventual consistency is achieved by publishing data change events, so that all interested consumers can eventually process the events. When the system is operating normally, eventual consistency happens in near real-time, but degrades gracefully when anomalies occur. In Chapter 4, Boundary Patterns, we discuss patterns that further shield end users from the anomalies.

It must be recognized that an event streaming system is an append-only, distributed, sharded database. As such, refer to the Cloud Native Databases Per Component pattern for the justifications for leveraging value-added cloud services and embracing disposable architecture in the context of event streaming as well. One thing that is different is that a database is owned by a single component, whereas an event stream is shared between components. But this does not mean that a single streaming cluster is shared across all components, as that would mean a catastrophic failure in that cluster would impact the whole system. Thus, we need to create bulkheads for our streams as well, by strategically segregating the system into cohesive groups of components, where each produce events to one stream in a topology of many independent streams.

We must also create a rational system of events as well. In the Trilateral API pattern, we will discuss how components should properly document the definitions of the events they produce and consume as part of their published interface. However, here we need to address event definitions at a more fundamental level. Streaming technology is part of the dumb pipes and smart endpoints generation. As such, stream consumers will ultimately receive events that they do not recognize and they must be able to treat these events in a consistent manner, so that they can be filtered out.