How dumb do you want your pipes?

Introduction

When transitioning from monolithic architectures to microservices enabling communications between microservices can be challenging. One strategy for getting this ‘right’ is using smart endpoints and dumb pipes. The alternative is to either have no messaging infrastructure or a full-blown enterprise service bus (ESB). In the former case, this likely means a lot of http calls between microservices, making them brittle, less efficient, and more susceptible to tight coupling. It can become cumbersome to manage an ESB, making it unnecessarily hard to add or remove services. So the advice to use dumb pipes seems good.

There are different messaging solutions, and a valid question to ask is, ‘how dumb is enough?’ If all messages are treated equally, this might require the endpoints to do smart things. In this blog post, I will compare Apache Kafka with Axon Server. This will focus on the messaging aspects and how this affects services using either one.

Plenty of resources exist to teach you more about Kafka. For example, via Confluent Developer. You can learn more about Axon Server with a few free courses within the AxonIQ Academy. There’s also this YouTube video specifically focused on message routing with Axon Server. I will quickly go through the relevant basics of both solutions.

Message format

Message formatting is a key aspect of the message bus. The format can provide information about the typical use of a message bus and how it might be coupled to specific serialization formats. The payload for both Kafka and Axon Server is binary – allowing flexibility regarding which format to use. 

The payload format with Kafka

In Kafka, the whole message is called a record (when using Java), where the value type is dependent on the deserializer used, which by default is a ByteArrayDeserializer. In Kafka – especially in combination with Confluent Cloud – it’s pretty common to use a schema registry and have a single type of message for each topic. Using a serializer that optionally registers a new schema and encodes the payload to the correct byte representation. The serializer can use the payload to fetch the correct schema and transform the bytes to the correct representation.

Some things in the Kafka ecosystem – like Kafka Streams, Kafka connect, and ksqlDB – may work less optimally when no schema registry is used or when multiple kinds of messages are put on the same topic. If Avro or Protobuf is used, the payload size could be much smaller than JSON. A small payload size helps with performance and reduces the storage needed.

The metadata format with Kafka

With Kafka, the payload is stored as the `value` of a record. In addition to the value, a `key` can also store information and is as flexible as the value. The key is optional, so can also be left empty. The key is also used for partitioning in most cases. Since the ordering can only be guaranteed within a partition, some caution should be taken when ordering is important. For example, adding a third is risky if the key previously was based on two properties. If both are used simultaneously, events that used to belong to the same partition are split between partitions.

Each message also has an offset – based on the partition – and a timestamp. Optionally headers can be added for additional information, which can be used for things like tracing or supplying information about how to deserialize the data.

The core message format with Axon Server

As previously stated, the payload is binary in Axon Server. In most cases, Axon Server will be used with Axon Framework and Spring Boot. In this case, both XStream(XML) and Jackson(JSON) can be used with configuration. By implementing a Serializer, you are free to use other formats. Typically the data is mapped to a Java POJO.

The metadata format with Axon Server

In addition to the payload, metadata can be added with Axon Server. This data also follows a binary format, which can utilize a serializer to turn it into a Java map. Command messages, domain event messages, and query messages have additional information apart from the metadata. Each command has a command name. A domain event has a sequence number, aggregate identifier, and type. A query message has a response type. These different types of messages play a crucial role in routing the messages.

Sending and receiving messages

Sending messages plays out a bit differently in Kafka and Axon Server. This section will take a closer look at how messages are typically sent and received with both solutions.

Sending messages with Kafka

With Kafka, there’s no distinction between different messages – all messages are handled the same way. Kafka is often used in event-driven architectures, where most or all messages are events. Messages can only be added to a specific partition, which is part of a topic.

A Kafka Producer creates the events, either directly or using an abstraction. This will always fetch the topic metadata first, so the producer knows the leading broker for all partitions. Several levels of feedback are available after sending a message. This makes it possible to choose between different delivery guarantees.

A large ecosystem surrounds Kafka. With Kafka Connect, many connectors are available to get data in or out of Kafka. With Kafka Streams, processing can be done within Kafka with abstractions like KTable, a table-like structure using the key of the message to get or set values like in a map. It also supports combining such tables, using tumbling windows to group events, and more.

Kafka supports transactions. With transactions, it’s possible to have different messages across topics and mark them all as committed if they all have been sent correctly. These transactions can’t be used for concurrency control, as a producer can’t ‘see’ what other producers are doing.

Producers and Consumers are decoupled with Kafka. A producer doesn’t know when a specific consumer reads an event. Although if needed, it’s possible to retrieve such information.

Receiving messages with Kafka

A Kafka Consumer client is used for receiving messages from Kafka. There are two main ways to consume them: either by specifying the exact partitions to consume or using a consumer group with one or multiple topics. Messages are typically consumed in small batches over the wire.

Messages are read from old to new. It’s possible to start from a specific time point, the beginning or the end. If there are stored offsets, it’s also possible to start there. Getting all the messages with a certain key is impossible with just Kafka, which is needed to rebuild the aggregate state quickly.

The order within a partition is guaranteed. When a topic contains multiple partitions, a consumer might not process all the events in order. As one partition might be processed before another.

Sending messages with Axon Server

Axon Server supports handling multiple kinds of messages. While in the case of events, those are stored durably – this is not the case for commands and queries. In the context of Axon Server, it’s better to speak about sending messages instead of creating them.

In most cases, Axon Framework will be combined with Axon Server. This provides several abstractions to make it easier to use. One of those abstractions is the different kinds of message buses. All are bidirectional, able to receive and send messages of a certain kind. The three available buses are the Command Bus, the Event Bus, and the Query Bus.

When discussing events in the context of Axon Server, it’s important to distinguish between domain events and non-domain events. Domain events are part of an aggregate and are typically created by invoking a command handler. Non-domain events don’t belong to an aggregate and can be created and sent using an Event Bus directly.

A query can be sent to a Query Bus to get an answer. To dispatch a query, one should give the wanted response type. This makes it flexible to work simultaneously with different response versions.

Receiving messages with Axon server

We’ll touch on routing in a bit. For now, let’s discover how an application is supposed to react to different kinds of messages. In most cases, the Axon Framework will be used, providing several abstractions which make handling these different kinds of messages easier.

The application will first build up the aggregate if it receives a command message. This means reading all the events targeting the aggregate and applying them. Optionally, a cache or snapshot can be used to make this faster. If the command succeeds, one or multiple domain events will be created and sent via the Event Bus.

When an application receives an event message, it can use the event to update a projection. This allows us to use the CQRS pattern. Besides building a projection, the application could do other things, like sending notifications.

When an application receives a query message, it can use the build-up projection to send an answer. It can use the response type to send the result back in a specific format.

Within a context, the ordering is kept. The events are not necessarily processed in the same order when using multiple segments.

Message routing

Message routing works quite differently in Axon Server and Kafka. The main differences exist because of the diversity of messages that Axon Server supports – compared to Kafka, which only has one kind of message.

Message routing in Kafka

With Kafka, routing is partly the client's responsibility. They must retrieve metadata before they can start sending or receiving messages. They must first connect to the correct broker instance for the partition.

This coordination also means they need to handle some errors when, for example, another broker is elected leader when one of the brokers has become unavailable.

The client must also connect to the group coordinator when using a consumer group. The group coordinator ensures all the partitions belonging to the topics are consumed and will reassign partitions when needed. The coordinator is a server-side component.

Message routing in Axon Server

How the message is routed in Axon Server depends greatly on the kind of message. An application must notify Axon Server which specific messages it can handle for commands and queries.

Routing commands
The command is only sent to one instance, as you don’t want to create multiple of the same events from one command. For performance reasons, commands belonging to the same aggregate are always sent to the same instance when multiple instances can handle the specific message. The response is routed back to the application which sends the command.

Routing queries
There are different kinds of queries that are covered in this article. The most common type is the point-to-point query. In this case, the query will be sent to all applications that can handle it, and the first response will be sent back. More advanced queries will use more complicated routing to get the results.

Routing events
Events are typically published via an aggregate. For the command to be handled successfully, this must be successful. Once an event is published, all the event subscribers can receive it.

Subscribers are responsible for keeping track of the location in the event stream that was handled successfully. Using Axon Framework, this can be done in a relational database or via the Mongo extension in MongoDB. The abstraction used for this is a Token Store. As events are typically used to build a projection, both the projection and the tokens can thus be stored in the same database. This makes it more probable to have consistent projections.

Scaling messages

When scaling out, you’d like to have a group of applications handle all the messages instead of having each application needing to handle all the messages. Axon Server and Kafka handle this differently.

Scalability with Kafka

The basic unit of data in Kafka is the partition. If a broker is struggling, moving partitions to another broker is possible. A cluster of brokers can be pretty large.

When scaling applications, the initial number of partitions for a topic might become a problem. Some consumers don't get any data when you have more consumers than partitions. While you can add partitions, this can lead to errors.

Scalability with Axon Server

Right now, Axon Server doesn’t offer partitioning. No partitioning means that one of the server instances is the leader for a specific context. This can provide enough performance, but partitioning might become an option if needed.

For event processing, the event stream can be split into different segments. When starting the stream for the first time, the number of segments can be configured. It’s also possible to split or merge segments later on.

Contrary to Kafka, how the events are split between segments can differ for different processors. The downside of this approach is that when a processing group is split across instances, each instance gets all the messages. Axon Server offers an optimization when a processor doesn’t process certain types of events. If the client signals a kind of event is not used, those will be skipped server side. The big advantage is that we can split messages differently if needed.

Conclusion

Now that we know more about both solutions let's go back to how dumb we would like our pipes to be. In a lot of cases, a dumb pipe is fine. For example, when we simply want centralized application logging. For more advanced use cases, we wish a smarter pipe. For example, when we have message types that require different routing.

I’ve done some experiments trying to do smart things with Kafka. For example, have something in front to handle concurrent writes or have something like the Command Bus or the Query Bus. Because of the limitations in the Kafka protocol, they are not without tradeoffs. When using Axon Framework is an option, having slightly smarter pipes via Axon Server makes it much easier. On the other hand, if you aren't building an event sourcing or CQRS architecture, a dumber pipe might suffice.

You don’t necessarily have to choose, as with the Kafka Extension, combining both is easy. As always, what's best primarily depends on the use case. If you have questions about routing messages with Axon Server or Axon Framework, feel free to ask on our Discuss platform.

Gerard Klijs
Software Engineer. With over 10 years of experience as a backend engineer, Gerard Klijs is a contributor to several GraphQL libraries, and also the creator and maintainer of a Rust library to use Confluent Schema Registry. He has an interest in event sourcing and CQRS and likes sharing knowledge via blogs, talks, and demo projects.
Gerard Klijs

Share: