Context, problem, and forces

We are building reactive, cloud-native systems composed of bounded isolated components which rely on event streaming for inter-component communication. Therefore, a large portion of the system functionality does not communicate over RESTful interfaces. There are few, if any, tools for documenting asynchronous interfaces in a standardized way. One team owns each component, a team may be responsible for multiple components, but a single team rarely owns all of the components.

We need to recognize and acknowledge that reactive, cloud-native systems are different and require a different way of thinking about systems. We no longer communicate via just a synchronous API, such as REST and/or GraphQL. We now have an asynchronous API for publishing events and another asynchronous API for consuming events, as well. We actually strive to eliminate any inter-component communication via a synchronous API. As we will discuss in Chapter 4, Boundary Patterns, we are limiting the use of synchronous APIs to the interactions with a frontend and with external systems. This difference permeates how we develop the system, test the system, and deploy the system, and how we interact with the other, upstream and downstream, teams involved in the system.

Bounded isolated components are loosely coupled in that they are unaware of their actual upstream and downstream counterparts, but these components are still dependent on the formats of the messages they exchange. These components form natural bulkheads, which implicitly shield them from failures in downstream components. As discussed in the Stream Circuit Breaker pattern, components still need to explicitly shield themselves from invalid messages produced by upstream components. And even though bounded isolated components are resilient to failures in other components, the system as a whole is not working properly when any component is misbehaving. Therefore, we still need to take proper steps to ensure we can deploy changes with confidence. We need to define how we will test these asynchronous interactions.

Each of these APIs has different deployment and scaling requirements. The synchronous API is the most traditional. The consuming API will be processing events from one or more streams, each with one or more shards. The publishing API, when following the Event-First variant of the Event Sourcing pattern, will publish directly from one of the other deployment units. However, when it follows the Database-First variant, it will have its own deployment, similar to the consuming API. In each of these cases, there are deployment alternatives, such as various container schedulers and/or function-as-a-service. We also have to ensure we have proper bulkheads at the cluster level, so that we are not sharing one monolithic cluster across too many components and certainly not across all components.