In my previous post, we learned what aggregation is and how it can be used. Now, let’s talk about some potential problems. It sounds like an easy task – just do some simple math, update the stored value, and voilà, right?
The first question you should ask is whether the data in the events you are aggregating is critical or not. What will happen if it is lost due to a technical issue, like a power outage? Sometimes, you may perform aggregation for analytical purposes, just to observe basic trends, and partial data loss in this case may be acceptable. In such scenarios, you have plenty of options, such as memory-based databases or even in-memory data structures in your own code (like lists, arrays, etc.), which can help maintain good performance.
But what should you do if the data is critical and event loss is absolutely unacceptable? In this case, you must use persistent storage that will survive a power outage. In this scenario, the information must be written to disk, and the file system should acknowledge that the write operation was successful.
It’s not a big deal when you’re just appending (inserting) new data and using high-performance NVMe drives. But what happens when you need to update an existing record? You have to read it first, make some changes, and then update the record with the result. Therefore, you might want to index your records to maintain performance at an optimal level. Sometimes, you may also need to use table relationships in your aggregation logic, which makes the process even more complex.
Here’s where things come into play – the size of the stored data and the expected performance (how many records per second should be updated). In the event mediation world, the size of each database file could be many gigabytes, with expected performance in the tens of thousands of transactions per second. You need to make a wise choice about which database to use and how you’re going to scale it.
With modern mediation platforms, you can be assured that you can easily achieve 1 million aggregated transactions per second, with a total storage volume of 10 terabytes. And that’s not the limit!