I’m experimenting with a new series of posts, called “Last Month in Nakadi”. In the Nakadi project, we maintain a changelog, that we update on each release. Each entry in the file is a one-line summary of a change that was implemented, but that alone is not always sufficient to understand what happened. There is still a fair amount of discussion and context that stays hidden inside Zalando, but we are working on changing that too.
Therefore, I will try, once a month, to provide some context on the changes that we released the month before. I hope that users of Nakadi, and people interested in deploying their own Nakadi-based service, will find this summary useful. Let’s start then, with what we released last month, February 2018.
Released on the 15th of February, this version includes one bug fix, and one performance improvement.
Fix: Problem JSON for authorization issues
A user of Nakadi reported that Nakadi does not provide a correct Problem JSON when authorization has failed.
Improvement: subscription rebalance
We found that, when rebalancing a subscription, Nakadi calls Zookeeper several times, which is costly. This improvement reduces the number of calls to Zookeeper when rebalancing subscriptions, improving the speed of rebalances.
Released on the 22nd of February, this version brings a new feature: the ability to allow a set of applications to get read access to all event types, overriding individual event types’ authorization policies, for archival purposes.
At Zalando, we maintain a data lake, where data is stored and made available to authorised users for analysis. One of the preferred ways to get data into the data lake is to push it to our deployment of Nakadi. Events are then consumed by the data lake ingestion applications, and saved there. Over time, we have noticed that event type owners, when setting or updating their event types’ authorisation policies, would on occasion forget to whitelist the data lake applications, causing delays in data ingestion. Another issue we noticed is that, should the data lake team use a different application to ingest data (they actually use several applications, working together), they would have to contact the owners of all event types from which data is ingested - that’s a lot of people, and a huge burden.
So, we decided to allow these applications to bypass the event types’ authorization policies, such that event type owners would not accidentally block the data lake’s read access. In a future release, we could add a way for the event type owner to indicate that they do not want their data ingested into the data lake.
We also added an optional warning header, sent when an event type is created or updated. We use it to remind our users that their data may be archived, even if the archiving application is not whitelisted for their event type. You can choose the message you want - or no message at all.
And that’s it for February. If you would like to contribute to Nakadi, please feel free to browse the issues on github, especially those marked with the “help wanted” tag. If you would like to implement a large new feature, please open an issue first to discuss it, so we can all agree on what it should look like. We very much welcome all sorts of contributions: not just code, but also documentation, help with the website, etc.