Kafkaesque

Apr 12, 2026 kafka kafkaesque programming java claude

On building a mocking library for Apache Kafka.

Big slow tests

I’ve worked a lot with Apache Kafka in the last few years. If you wake up one morning and think “I know, I will use Kafka as the queue in my tiny homebrew project” then you are probably in for a world of pain. If, however, you are in world of vast data flows with strict auditing and compliance requirements, then, well, yup, still a world of pain, but Kafka’s probably going to be easing it rather than exacerbating it.

I get the distinct impression that it’s a total bitch to manage, so ideally you want someone else to be doing that. Friendly wave to all the “intrastructure” and “platform” people out there mapping out golden paths with their fedora and bullwhip.

But as a developer wanting to get shit done, one of the real pains is that it be a huge nuisance for integration testing. The usual options one chooses are:

Rewire your application to use a dummy client library
Embed Kafka directly into your integration suite (i.e. launch real Kafka brokers in-process)
Launch Kafka out of process, usually in Docker containers (usually using testcontainers)

They’re all a bit … annoying. Kafka brokers can be pretty slow to start up and shut down, so if you have 1000 test cases and it takes, optimistically, half a second to start up and another half second to shut down, and you want to completely isolate your test cases, you’re already looking at a test suite that takes over a quarter of an hour to run. So people tend not to do that.

But if you don’t do that, you’re left with the hassle of making sure that none of your test events interact in adverse ways, and when people get flakey tests they tend to serialize them to avoid issues. Plus testing async behaviour is a little tricky so you’ll see a lot of hacky sleep(3000) style delays to make sure async processes have completed before test assertions are raised.

It’s a huge mess.

Making a mockery of it

It’s not a coincidence that I’m a big fan of the WireMock testing tool/library. It solves a similar problem with HTTP based APIs. Probably you know it already. Your application depends on APIs, but you don’t want to run the whole of some horrifying third party legacy system just to spin them up in some integration test; you fake them a little, but they’re real enough for your application - they’re reached over the network, and speak real HTTP and send and receive the same kind of payloads.

Kafkaesque is my take on that, but for Kafka. I stress that this is “inspired by” WireMock, not “based on” because my library is a toy in comparison, but I think it scratches a similar itch.

Kafkaesque lets you spin up an in-memory wire-compatible service accessible over TCP on whatever port you choose (or let it choose an arbitrary free one), and then use that instead of real Kafka brokers. It’s fast because it doesn’t attempt to meet any of the durability or availability concerns; everything is managed in-memory and there’s no need to worry about leadership elections and the like because there’s only one process. Fast enough… that you really can spin up a server-per-test case without wincing. Your test payloads will be isolated from each other by actual processes, not ingenious prefixes or other test-data level contrivances.

For example

I’ve built Kafkaesque with support for JUnit 5 and JUnit 4. JUnit 5 (Jupiter) is nicer, so I’ll use that for an example:

@Kafkaesque(topics = {
    @KafkaesqueTopic(name = "orders"),
    @KafkaesqueTopic(name = "notifications")
})
class OrderNotificationServiceTest {

    @Test
    void shouldSendNotificationWhenOrderIsPlaced(
            final KafkaesqueServer kafkaesque,
            @KafkaesqueProducer final KafkaProducer<String, String> producer) {

        // Start the application under test, pointed at our mock Kafka
        var application = new OrderNotificationService(kafkaesque.getBootstrapServers());
        application.start();

        // Simulate an incoming order
        producer.send(new ProducerRecord<>("orders", "order-123",
                """
                { "customer": "Alice", "item": "Kafka In Action", "quantity": 1}
                """)).get();

        // Verify that the service produced a notification in response
        await().atMost(Duration.ofSeconds(5)).untilAsserted(() -> {
            var notifications = kafkaesque.getRecordsByTopic("notifications");
            assertThat(notifications).hasSize(1);
            assertThat(notifications.getFirst().key()).isEqualTo("order-123");
            assertThat(notifications.getFirst().value()).contains("Alice");
        });

        application.stop();
    }
}

There’s quite a lot of boilerplate here, but it’s mostly fairly standard test-boilerplate not stuff specific to Kafkaesque; the @Kafkaesque annotation creates suitable topics ahead of time (although you can use Admin client APIs to do this kind of thing directly if you want), and creates the server. In this test we’re magically injecting a producer into the test to put test data onto the topics, but again we could create a KafkaProducer directly from the Kafka client library if we wanted to.

Significantly, the application under test doesn’t need to know that it’s not talking to a real Kafka instance. It just gets configured to talk to brokers by the same mechanisms that it would in production.

Room for improvement

It’s a good start, and it does work, but while I’m opening it up for public consumption now, I think there are a few more features need to take it to the next level.

Kafkaesque needs to support older versions of Java. A lot of big organisations are deeply conservative about which runtime they’re on, and test suites don’t always get the love they need. I’m not willing to roll it all the way back to Java 8, because that’s a bit too painful, but I plan to support Java 11+ on the next (imminent) release.

On a similar note, a lot of Kafka clusters are not on a recent version. Upgrading a cluster can be hell, particularly if it’s fallen a long way behind, and they’re almost always only internally facing. As such, I need to provide support for much older versions of the client. Or at least test it with much older versions and declare whatever breaks!

Wiremock provides an excellent stubbing DSL so that you can just declare up front what responses it will return in various circumstances. Kafkaesque could do that too (e.g. stub that when I publish event W in topic X then an event Y will appear in topic Z).

Similarly, Wiremock provides a verification DSL so that you can declare what interactions the application under test was supposed to make with the webserver. Again, Kafkaesque could do this! I’d love to reduce that block of boilerplate in the example and swap it out for something more DSL-ish, while still using Awaitilty (which is amazeballs by the way) under the hood to reduce peoples’ tendency to litter otherwise sane codebases with calls to sleep for multiple seconds. They seem so innocuous to fix flakey tests, but they often only make the test less flakey, and the test suite completion times balloon as they accumulate.

As a sketch, I’m imaginging something like the following from those last two points:

@Kafkaesque(topics = {
    @KafkaesqueTopic(name = "orders"),
    @KafkaesqueTopic(name = "notifications")
})
class OrderNotificationServiceTest {

    @Test
    void shouldSendNotificationWhenOrderIsPlaced(final KafkaesqueServer kafkaesque) {

        // Start the application under test, pointed at our mock Kafka
        var application = new OrderNotificationService(kafkaesque.getBootstrapServers());
        application.start();
        
        // WARNING - NOT REAL CODE, THIS DOES NOT (YET) WORK!
        stub(publish(
                "Simulate an incoming order",
                "orders", 
                "order-123", 
                        """
                        { "customer": "Alice", "item": "Kafka In Action", "quantity": 1}
                        """));

        // WARNING - NOT REAL CODE, THIS DOES NOT (YET) WORK!
        verify(within(Duration.ofSeconds(5)))
                .published("notifications", exactly(1), 
                        event(keyMatching("order-123").bodyMatchingJson("customer.name", "Alice"));

        application.stop();
    }
}

Well, it’s a bit terser, anyway!

There are a bunch of other nice-to-have or essential-to-have-but-haven’t-yet-done features on the todo list so if this all sounds interesting but not-quite-there, then keep an eye on the project.

I, Claude-ius

I used Claude Code a lot on this project. For personal projects I prefer to be hands-on, but this is more of a project where I just wish it already existed. Since it doesn’t, I was more open to handing some of the fun to Claude. Moreover I’ve had an attempt at this project in the past - at that point I was trying to implement the wire protocol directly, and found that a lot of the documentation for that was ambiguous or even wrong. So I parked it with the intention of re-starting basing all the wire-protocol code around the Apache Kafka client library instead of re-creating it myself. Good call, but a lot less interesting, so I kept procrastinating. Claude looked like (and was) a good way to break the cycle of procrastination there.

I’m using Claude at work a lot at the moment, and it’s an impressive tool. It seems to produce particularly good results when you break its tasks up into manageable tightly focused chunks, and this happens to align well with the way I like to work on projects anyway. Over the couple of weekends (including the long Easter one) I’ve been steering it to do chunks of work while I mostly take coffee breaks and think about the next step. It’s gone well enough that I upgraded to the Max account and feel like I’ll get my money’s worth out of it.

What’s particularly notable to me is that it’s writing good quality Java. Sure, there are some gaps, and I’m sure any real users will flush out a whole tonne of bugs when they start using this library in anger, but I’m pretty happy with what it produced. It needed some steers with things like package structure, but more along the lines of “don’t you think this package is getting a bit chonky?” rather than needing to lay down exactly how I proposed breaking it up. Checkstyle helped a lot with some of the coarser requirements, and I will look at increasing the breadth of linting rules, maybe including things like cyclometric complexity checks for future iterations.

Meanwhile I should dig out another one of my long-procrastinated projects and give Claude more leeway. It will be interesting to see if it goes off in some insane direction or stays on track. Given my propensity toward yak shaving I guess I sympathise even if it does decide it needs to polish a few arbitrary turds along the way, or delete the entire codebase and rage quit!