The BaselineTopology concept in Apache Ignite 2.4

The BaselineTopology concept in Apache Ignite 2.4
At the time of the appearance in the Apache Software Foundation of the Ignite project, it was positioned as a clean in-memory-solution: a distributed cache that raised data from a traditional database in memory to gain access time. But already in the release 2.1 there was a built-in persistence module ( Native Persistence ), Which allows to classify Ignite as a full-fledged distributed database. Since then, Ignite has ceased to depend on external systems to ensure persistent storage of data, and a bundle of rakes configuration and administration , to which the users have repeatedly attacked, has disappeared.
However, the persistent regime generates its own scenarios and new questions. How to prevent unsolvable data conflicts in a split-brain situation? Can we refuse to rebalance partitions if the output of the node now does not mean that the data on it is lost? How to automate additional actions like activation of the cluster. ? BaselineTopology to help us.
documentation .
We will perform simple manipulations on it in the following sequence:
We stop the cluster and start the group of nodes A.
Update any keys in the cache.
Let us stop group A and start group B.
Apply other updates for the same keys.

Since Ignite works in database mode, when the nodes of the second group are stopped, the updates will not be lost: they will become available as soon as we start the second group again. So after restoring the initial state of the cluster, the different nodes may have different values ​​for the same key.
Without special frenzy, simply stopping and starting the nodes, we were able to bring the data in the cluster into an undefined state, which is impossible to resolve automatically.
Preventing this situation is just one of the tasks of BLT.
The idea is that in the persistence mode, the cluster starts up through an additional stage, activation.
With the very first activation, the first BaselineTopology is created and stored on the disk, which contains information about all the nodes present in the cluster at the time of activation.
This information also includes a hash computed from the online node identifiers. If during subsequent activation some nodes are missing in the topology (for example, the cluster was rebooted and one node was put into service), then the hash is recalculated, and the previous value is stored in the activation history inside the same BLT.
Thus, BaselineTopology supports a chain of hashes describing the composition of the cluster at the time of each activation.
In stages 1 and ? after the start of the node groups, the user will have to explicitly activate the incomplete cluster, and each online node will update the BLT locally, adding a new hash to it. All nodes of each group will be able to compute the same hashes, but in different groups they will be different.
You could already guess what will happen next. If the node tries to join the "foreign" group, it will be determined that the node is activated regardless of the nodes of this group, and it will be denied access.
It is worth noting that this validation mechanism does not give full protection against conflicts in the situation of Split-Brain. If the cluster is divided into two halves in such a way that at least one copy of the partition remains in each half, and the half-way was not re-activated, then still the situation is likely when conflicting changes in the same data will occur in half. BLT does not refute CAP Theorem , but it protects against conflicts with obvious administrative errors.
In addition to preventing conflicts in the data, BLT allows you to implement a couple of optional but nice options.
Plushka №1
- minus one manual action. The activation mentioned above should have been performed manually after each cluster reboot; automation tools "out of the box" were missing. If BLT is available, the cluster can independently decide on activation.
Although the Ignite cluster is an elastic system, and nodes can be added and output dynamically, BTL proceeds from the concept that in the database mode the user maintains a stable cluster composition.

When the cluster is first activated, the newly created BaselineTopology remembers which nodes should be present in the topology. After a reboot, each node checks the status of other BLT nodes. Once all nodes are online, the cluster is automatically activated.
Plushka №2
- Savings on network interaction. The idea, again, is based on the assumption that the topology will remain stable for a long time. Previously, the node output from the topology even for 10 minutes led to the rebalancing of cache partitions to maintain the number of backups. But why spend network resources and slow down the cluster if problems with the node are resolved within minutes, and it will again be online. BaselineTopology just optimizes this behavior.
Now the cluster by default assumes that the problem node will soon return to operation. Some caches during this time will work with fewer backups, but this does not lead to an interruption or slowdown of the service.
Management of BaselineTopology
Well, one way we already know: BaselineTopology is automatically created when you first activate the cluster. Thus in BLT all server nodes which at the moment of activation were in a mode online will get.
Manual administration of BLT is done using a control script from the Ignite distribution, which can be read more about the documentation page , dedicated to cluster activation.
The script provides a very simple API and supports only three operations: adding a node, removing a node and installing a new BaselineTopology.
In this case, if the addition of nodes is a fairly simple operation without special tricks, then removing the active node from the BLT is a more delicate task. Its performance under the load is fraught with races, in the worst case - the hanging of the entire cluster. Therefore, the removal is accompanied by an additional condition: the deleted node must be offline. If you try to delete an online site, the control script will return an error and the operation will not be started.
Therefore, when removing a node from the BLT, one manual operation is still required: stopping the node. However, this usage scenario will clearly not be the main one, so the additional labor costs are not too great.
The Java-interface for controlling BLT is even simpler and provides only one method that allows you to install BaselineTopology from the list of nodes.
An example of changing BaselineTopology using the Java API:
Ignite ignite = /* * /;
IgniteCluster cluster = ignite.cluster ();
//Get BaselineTopology.
curBaselineTop = cluster.baselineTopology ();
for (ClusterNode node: cluster.topology (cluster.currentTopologyVersion ())) {
//If we want this node to be in BaselineTopology
//(shouldAdd (ClusterNode) - custom function)
if (shouldAdd (node)
curTop.add (node);
//Update BaselineTopology
cluster.setBaselineTopology (curTop);
Ensuring data integrity is the most important task that any data warehouse must solve. In the case of distributed DBMS, which includes Apache Ignite, the solution of this problem becomes much more complicated.
The concept of BaselineTopology allows you to close part of the real scenarios in which the integrity of the data may be disrupted.
Another priority for Ignite is performance, and here BLT also allows you to significantly save resources and improve the response time of the system.
Functionality of Native Persistence appeared in the project quite recently, and, undoubtedly, it will develop, become more reliable, more productive and more convenient to use. And together with it the concept of BaselineTopology will develop.
+ 0 -

Add comment