Components in distributed systems depend on each other for data and system updates, and in time-sensitive applications the updates need to happen as fast as possible. There are two ways these updates are handled in software architecture:
- Messages: Updates are sent directly from one component to another to trigger an action.
- Events: Updates indicate that an action occurred at a specific time. They are not directed to a specific recipient.
Both of these packages of information are generated for processing, manipulation, or storage.
What are events?
An event contains the details about an action that occurred at a specific time. For example, when you purchase something online, an event of that purchase includes the product, the payment, the delivery, and the time of the purchase. The purchase event occurred in the purchasing component of the system, but it also impacts the inventory, the payment processing, and the shipping components.
In event-driven architecture, all actions are defined and packaged as events to precisely identify the individual actions and how they're processed throughout the system.
Instead of processing updates one after the other in a serial fashion, event-driven architecture lets the components process events at their own pace. This architecture helps developers build fast and scalable systems.
Some components of the distributed system produce events as a result of a specific action done in that component. These components are called producers. When producers send these events, and the events are read or stored in sequence, these events represent a replayable log of changes in the system, also called a stream.
A single event includes information required by one or many other components in the system, also called consumers, to effect additional changes. Consumers can store, process, or react to these events. Many times, consumers also run processes that produce events for other components in the system, so being a producer is not mutually exclusive from being a consumer.
In more traditional message-driven systems, data and system updates are sent as messages directly from the producer to the consumer. The producer waits for acknowledgement that the consumer received the message before it continues with its processes. This can create issues that can degrade the efficiency of the system.
|If similar information is required by multiple recipients, the sender must send a message designed for each recipient independently.||The producer sends individual events designed for consumption by multiple consumers.|
|If the recipient delays acknowledging receipt, the producer can’t complete its process until it receives the acknowledgement.||The producer sends events to an event processing system that can acknowledge receipt and guarantee delivery to the consumers.|
|If there's a break in the connection between a producer and a recipient, the producer doesn’t know if the event was processed or if it needs to resend the event.||The event processing system can track communication between producers and consumers if there's a broken connection.|
|In clustered deployments, each producer node has to send messages to each recipient node.||Each producer node sends events to the event processing system, and each consumer node retrieves events from that same system.|
Event processing systems like Redpanda let you decouple the producer from the consumer. This allows for asynchronous event processing, event tracking, event manipulation, and event archiving.
Turning data into a product
When you decouple your systems and set up an event-processing system to collect and route events, you transform your event stream from individual actions into a warehouse of information about everything that happens in your application. This is called a data product.
Every event that occurs in your system can be analyzed, mined, and transformed to give you insight into your business and power to build new capabilities. You can:
- Replay events from the past and route them to another process in your application.
- Run transformations on the data in real-time or historically.
- Integrate with other event processing systems that use the Kafka API.
The event-processing system needs to be simple to run, and it needs to leverage the speed of your hardware to handle the events as efficiently as possible.
Consumers and topics
Topics are event streams in Kafka.
- A consumer group allows many topics to be consumed.
- A consumer within a group can subscribe to multiple topics.
- For each topic that a group subscribes to, the records are sent to one consumer within the group at least once.
- Consumer groups are separate. Each group receives each record from each subscribed topic at least once.
Topics are split into partitions.
- Every partition is handled by exactly one leader node in the cluster, so to fully use your cluster, you need at least the same number of partitions as you have nodes.
- The records within a partition are ordered.
- There is no ordering between records in different partitions in the same topic.
The partition is the unit of scale.
- Only one consumer in a group can receive records from a given partition at a specific time. This can change as the group is rebalanced.
- If a consumer cannot keep up, reduce the work for each consumer by adding additional consumers. You may also need to increase the number of partitions. Because partitions are assigned exclusively to a single consumer, you should have at least three times the number of partitions as consumers to allow for a balanced assignment.