Costs of coordination in teams
This is a brief digression in the current series of articles on how to avoid introducing services for different entities. An interesting conversation over dinner led to thoughts that I decided to write down.
The Amdahl Law
In 196? Gene Amdahl presented an argument against parallel computing. He argued that productivity growth is limited, since only part of the task is amenable to parallelization. The size of the rest of the "sequential part" differs in different tasks, but it is always there. This argument became known as Amdahl's law.
If you plot the "acceleration" of task execution depending on the number of parallel processors allocated to it, you will see the following:
This is an asymptotic graph for a fragment that can not be paralleled ("sequential part"), so there is an upper limit to the maximum acceleration
universal scalability law (Universal Scalability Law, USL). It uses two parameters: one for "competition" (which is similar to a sequential part), and the second for "incoherence" (incoherence). Inconsistency relates to the time spent on restoring consistency, that is, a common view of the world of different processors.
In one CPU, reconciliation costs arise due to caching. When one core changes the cache line, it tells other kernels to extract this line from the cache. If everyone needs the same line, they spend time downloading it from the main memory. (This is a slightly simplified description but in a more precise formulation, there is still a reconciliation cost).
On all nodes of the database there are reconciliation costs due to the algorithms of matching and saving the data sequence. The penalty is paid when the data is changed (as in the case of transactional databases) or when reading data in the event of ultimately agreed repositories.
The effect of USL
If you plot the USL graph according to the number of processors, then there will be such a green line:
The violet line shows that the law of Amdal
would have been predicted.
Note that the green line reaches a peak, and then decreases. This means that there is a certain number of nodes at which the performance is maximum. Add more processors - and performance drops . I saw this in real load testing.
People often want to increase the number of processors and improve performance. This can be done in two ways:
Reduce the serial part of
Reduce the reconciliation costs
USL in human collectives?
Let's try an analogy. If the computational "task" is a project, then it is possible to represent the number of people in the project as the number of "processors" performing work.
In this case, the sequential part is a piece of work that can be performed only sequentially, step by step. This may be a topic for the future article, but now we are not interested in the essence of the sequential part.
It seems that we see a direct analogy with the costs of harmonization. Regardless of the time that team members spend on restoring a common view of the world, reconciliation costs are present.
For five people in a room, these costs are minimal. Five-minute drawing of the marker on the board once a week or so.
For a large team in several time zones, a fine can grow and be formalized. Documents and walkthroughs. Presentations for the team and so on.
In some architectures, matching is not so important. Imagine a team with employees on three continents, but everyone is working on one service that uses data in a strictly defined format and creates data in a strictly defined format. They do not need consistency with respect to changes in processes, but consistency is required with respect to any changes in formats.
Sometimes tools and languages can change the reconciliation costs. One of the arguments in favor of static typing is that it helps to interact in a team. In fact, the types in the code are the mechanism for translating changes in the world model. In a dynamically typed language, we will either need secondary artifacts (unit tests or chat messages), or we need to create boundaries where some departments very rarely restore consistency with others.
All these methods are aimed at reducing the costs of harmonization. Recall that excessive scaling causes a decrease in bandwidth. So if you have high reconciliation costs and too many people, then the team as a whole is running slower. I saw teams where it seemed that we could cut half the people and work twice as fast. USL and reconciliation costs now help to understand why this is happening - it's not just garbage disposal. It is about reducing the overhead costs for the exchange of mental models.
In "The cycle of fear" I referred to code bases where developers knew about the need for large-scale changes, but were afraid to accidentally cause harm. This means that the over-hyped team and has not reached consistency. It seems that after a loss, the consistency is very difficult to recover. This means that you can not ignore the reconciliation costs.
USL and microservices
In my opinion, USL explains the interest in microservices. By dividing a large system into smaller and smaller parts, deployed independently of each other, you reduce the successive part of the work. In a large system with a large number of participants, the sequential part depends on the amount of effort to integrate, test, and deploy. The advantage of microservices is that they do not need integration work, integration testing or delay for synchronized deployment.
But the costs of reconciliation mean that you can not get the desired acceleration. Perhaps the analogy here is a bit tense, but I think that you can consider the interface changes between the micro services as requiring reconciliation between the teams. If this is too much, then you will not get the desired benefits from the microservices.
What to do about it?
My suggestion: look at the architecture, language, tools and command used. Think about where the time for reconciliation is lost, when people make changes to the system model of the world.
Look for breaks . Gaps between the internal boundaries of the system and splits within the team.
Use the environment to transfer the changes, so that the reconciliation process is for everyone, not individually.
Look at the communications of your team. How much time and effort does it take to ensure consistency? Maybe make small changes and reduce the need for it?
It may be interesting