When speed and scaling are needed: the server of distributed iOS-devices
Many developers of UI-tests under iOS certainly know the problem of the test run time. Badoo is running more than 1400 end-to-end tests for iOS applications for every regression run. This is more than 40 machine hours of tests, which take place in 30 real minutes.
Nikolay Abalov from Badoo shared how it was possible to speed up the execution of tests from 1.5 hours to 30 minutes; how to solve the closely related tests and infrastructure iOS, going to the device server; as it simplified the parallel launch of tests and made tests and infrastructure easier to support and scale.
You will learn how to easily run tests in parallel using tools such as fbsimctl, and how sharing tests and infrastructure can simplify the acceptance, support, and scaling of your tests.
video ), And now for Habr we made a text version of the report. Next - the narrative from the first person:
Hello everyone, today I will tell you about the scaling of testing under iOS. My name is Nikolay, I'm in Badoo, I'm mainly engaged in iOS infrastructure. Prior to that, he worked for 3 years in the company 2GIS, was engaged in development and automation, in particular, wrote Winium.Mobile - the implementation of WebDriver for Windows Phone. I was taken in Badoo to work on the automation of Windows Phone, but after a while the business decided to suspend the development of this platform. And I was offered interesting tasks on automation of iOS, I will tell about it today.
What will we talk about? The plan is as follows:
Informal statement of the problem, introduction to the tools used: how and why.
Parallel testing on iOS and how it developed (in particular, on the history of our company, since we started to deal with it in 2015).
Device server is the main part of the report. Our new model of parallelizing tests.
The results we achieved with the help of the server.
If you do not have 1500 tests, you may not need a server server, but you can still pull out interesting things from it, and it's about them. They can be applied if you have 10-25 tests, and this will still give either acceleration or additional stability.
And, finally, the summing up.
The first is a little about who is using what. Our stack is a little non-standard, because we simultaneously use both Calabash and WebDriverAgent (which gives us the speed and backdoors of Calabash when automating our application and at the same time full access to the system and other applications via WebDriverAgent). WebDriverAgent is a WebDriver implementation for iOS from Facebook, which is used inside the Appium. And Calabash is an embedded server for automation. The tests we write in human readable form using Cucumber. That is, we have a pseudo-BDD in the company. And because we used Cucumber and Calabash, we inherited Ruby, all the code is written on it. There's a lot of code, and I have to continue writing on Ruby. To run the tests in parallel, we use parallel_cucumber , a tool written by one of my colleagues in Badoo.
Let's start with what we had. When I started to prepare the report, there were 1200 tests. By the time they were completed, there were 1300. While I got here, the tests were already 1400. This is end-to-end tests, not unit tests and integration tests. They are 35-40 hours of computer time on one simulator. They passed before in an hour and a half. I'll tell you how they began to pass in 30 minutes.
At us in the company there is a working process with branches, review and start of tests on these branches. Developers create about 10 queries in the main repository of our application. But there are also components in it that fool around with other applications, so sometimes there are more than ten. As a result, 30 test runs a day, at least. Because the developers are shoving, then they realize that they've been hacked with bugs, perezalivayut, and all this runs full regression, simply because we can run it. On the same infrastructure, we run additional projects, such as Liveshot, which takes screenshots of the application in the main user scenarios in all languages so that translators can verify the correctness of the translation, whether it is placed on the screen, and so on. As a result, about one and a half thousand hours of computer time are available at the moment.
First of all, we want developers and testers to trust automation and rely on it to reduce manual regression. For this to happen, it is necessary to achieve from the automation of fast and, most importantly, stable and reliable operation. If the tests pass for an hour and a half, the developer will get tired of waiting for the results, he will start doing another task, his focus will switch. And when some tests fall, he will be very much not happy that you have to come back, switch attention and do something. If the tests are unreliable, then in time people begin to perceive them only as an obstacle. They constantly fall, although there are no bugs in the code. These are Flaky tests, some kind of interference. Accordingly, these two points were disclosed in these requirements:
Tests should be 30 minutes or faster.
They must be stable.
They must be scalable so that we can add another hundred tests, fit in half an hour.
Infrastructure should be easily maintained and developed.
On simulators and physical devices all should be launched equally.
In general, we drive tests on simulators, and not on physical devices, because it's faster, more stable and easier. Physical devices are used only for tests that really require it. For example, a camera, push-notifications and the like.
How can you meet these requirements and do everything well? The answer is very simple: we remove two-thirds of the tests! This solution fits within 30 minutes (because only a third of the tests remain), it is easy to scale (you can remove more tests), and increases reliability (because first of all we remove the most unreliable tests). On this I have everything. Questions?
But seriously, every joke has some truth. If you have a lot of tests, you need to review them and understand which ones are of real benefit. We had another task, so we decided to see what can be done.
The first approach is to filter tests based on coverage or components. That is, select the appropriate tests based on the file changes in the application. About this I will not tell, but this is one of the tasks that we are solving at the moment.
Another option is the acceleration and stabilization of the tests themselves. You take a concrete test, see which steps take the most time in it and whether it is possible to optimize them somehow. If any of them are very often unstable, you control them, because this reduces the restart of the tests, and everything goes faster.
And, finally, a completely different task is to parallelize the tests, distribute them to a large number of simulators, and provide a scalable and stable infrastructure, so that it can be parallelized.
In this article we will talk mainly about the last two points and at the end, in tips & tricks, we will touch on the point about speed and stabilization.
Parallel testing for iOS
Let's start with the history of parallel testing for iOS in general and Badoo in particular. To begin with, simple arithmetic, here, however, the error is in the formula, if the dimensions are compared:
There were 1300 tests for one simulator, it turns out 40 hours. Then comes Satish, my leader, and says that he needs half an hour. It is necessary to invent something. In the formula appears X: how many simulators run, so that in half an hour everything has passed. The answer is 80 simulators. And immediately the question arises where these 80 simulators shove, because they do not fit anywhere.
There are several options: you can go to clouds like SauceLabs, Xamarin or AWS Device Farm. And you can come up with everything and do well. Given that this article exists, we did everything well. We decided so, because a cloud with such a scale will be quite expensive, and there was also a situation when iOS 10 came out and Appium almost a month released support for it. This means that in SauceLabs we could not automatically test iOS 10 for a month, which did not suit us at all. In addition, all the clouds are closed, and you can not affect them.
So, we decided to do everything in-house. We started somewhere in 201? then Xcode could not run more than one simulator. As it turned out, he can not run more than one simulator under one user on one machine. If you have a lot of users, you can run simulators as many as you want. My colleague Tim Baverstoke came up with a model on which we lived long enough.
There is an agent (TeamCity, Jenkins Node and the like), it starts parallel_cucumber, which simply goes to remote machines on ssh. The picture shows two cars for two users. All the necessary test-type files are copied and run on remote machines by ssh. And tests already run the simulator locally on the current desktop. To make it work, you had to go to each machine, create 5 users, for example, if you want 5 simulators, make one user an automatic login, open screensharing for the rest so that they always have a desktop. And configure the ssh daemon so that it has access to the processes on the desktop. So in a simple way, we started running tests in parallel. But there are several problems in this picture. First, the tests control the simulator, they are in the same place as the simulator itself. That is, they must always run on poppies, they eat up resources that could be spent on running simulators. As a result, you have less simulators on the machine and they are more expensive. Another point is that you have to go to each machine, configure users. And then you just dive into the global ulimit. If there are five users, they raise a lot of processes, and at some point the system will end with descriptors. You will reach the limit, the test will try to open the file, it will fall, and from that moment everything will start to fail.
In 2016-201? we decided to move to a slightly different model. Have looked report of Lawrence Lomax from Facebook - they introduced fbsimctl, and partly told how the infrastructure works on Facebook. There was still report of Victor Koronevich about this model. The picture is not much different from the previous one - we just got rid of the users, but this is a big step forward, because now there is only one desktop, fewer processes are started, simulators have become cheaper. In this picture, there are three simulators, not two, since resources have been freed to run an additional one. With this model, we lived for a very long time, until mid-October 201? when we started moving to our server of remote devices.
This looked like iron. Left box with macbukami. Why did we run all the tests on them - it's a separate big story. Running the tests on the MacBooks that we inserted into the iron box was not a good idea, because somewhere around dinner they started to overheat, because of them the heat goes badly when they are so lying on the surface. Tests became unstable, especially when simulators began to fall on boot.
We decided this simply: put notebooks "tent", the area of the airflow increased and unexpectedly increased the stability of the infrastructure.
So sometimes you do not have to work on software, but go around turning laptops.
But if you look at this picture, here is some mess from the wires, adapters and generally tin. This is an iron part, and it was still good. The software also created a complete weave of tests with infrastructure, and it was impossible to live like that.
We identified the following problems:
The fact that the tests were closely related to the infrastructure, ran simulators and controlled their entire life cycle.
This resulted in the complexity of scaling, because adding a new node meant setting it up for both tests and simulators. For example, if you wanted to update Xcode, you had to add the workaround directly to the tests, because they were chasing different versions of Xcode. There are some heaps of if to run the simulator.
The tests are tied to the machine where the simulator is located, and it costs a pretty penny, since they have to be run on poppies instead of * nix, which is cheaper.
And it was always very easy to dig inside the simulator. In some tests we went to the file system of the simulator, deleted some files or changed them, and everything was fine until it was done in three different ways in three different tests, and then unexpectedly the fourth started to fall if it was not lucky to start after those three.
And the last moment - the resources did not stick. There were, for example, four TeamCity agent, each had five machines connected, and it was possible to run tests only on its five machines. There was no centralized resource management system, because of this, when only one task comes in, it went on five machines, and all the other 15 were idle. Because of this, the builds went for a very long time.
The new model
We decided to move to a beautiful new model.
Have removed all tests for one machine, where TeamCity agent. This machine can now be on * nix or even on Windows, if you so wanted. They will communicate over HTTP with some thing we call a device server. All simulators and physical devices will be somewhere out there, and the tests will be run here, request the device via HTTP and then work with it. The scheme is very simple, only two elements on the diagram.
In reality, of course, behind the server was ssh and stuff. But now it does not bother anyone, because the guys writing tests in this scheme are at the top, and they have some specific interface to work with a local or remote simulator, so they are doing well. And now I'm working below, and I have everything as it was. We have to live with this.
What does it give?
First, the division of responsibility. At some point during the automation of testing it is necessary to consider it as an ordinary development. It uses the same principles and approaches that developers use.
It turned out to be a strictly defined interface: you can not directly do something with the simulator, for this you need to open a ticket in the device server, and we'll figure out how to do it optimally, without breaking other tests.
The test environment has become cheaper, because we raised it in * nix, which is much cheaper than poppies in the service.
And the sharing of resources has appeared, because there is a single layer with which all communicate, it can plan the distribution of the machines behind it, i.e. sharing of resources between agents.
Above is shown, as it was before. Left conventional time units, say, tens of minutes. There are two agents, 7 simulators are connected to each, at time 0 the build comes and takes 40 minutes. After 20 minutes another one arrives, and it takes the same time. Everything seems fine. But there, and there are gray squares. They mean that we lost money, because we did not use the available resources.
You can do this: the first build comes and sees all the free simulators, distributed, and the tests are accelerated twice. For this, there was nothing to do. In reality, this often happens, because developers rarely throw out their brunches at the same minute. Although sometimes this happens, and "checkers", "pyramids" and the like begin. Nevertheless, in most cases, you can get free acceleration twice, simply by placing a centralized system for managing all resources.
Other reasons to go to this:
Black boxing, that is now device server is a black box. When you write tests, you think only about the tests and think that this black box will always work. If it does not work, you just go and knock the one who should do it, that is me. And I have to repair it. Not only me in fact, the whole infrastructure is engaged in several people.
You can not spoil the insides of the simulator.
Do not put a million utilities on the machine, so that everything starts - you just put one utility that hides all the work in the device server.
It became easier to update the infrastructure, which we'll talk about somewhere in the end.
Reasonable question: why not Selenium Grid? First, we had a lot of legacy code, 1500 tests, 130 thousand lines of code for different platforms. And all this was controlled by parallel_cucumber, which created the life cycle of the simulator outside the test. That is, there was a special system that loaded the simulator, waited for its full readiness and gave it to the test. In order not to overwrite everything, we decided to try not to use the Selenium Grid.
We also have a lot of non-standard actions, and we very rarely use WebDriver. The main part of tests for Calabash, and WebDriver is only auxiliary. That is, we do not use Selenium in most cases.
And, of course, we wanted everything to be flexible, easily prototyped. Because the whole project began simply with the idea that they decided to check, implemented in a month, everything started, and it became the main decision in our company. By the way, in the beginning we wrote to Ruby, and then we copied the device server to Kotlin. The tests were on Ruby, and the server was on Kotlin.
Now more about the device server itself, how it works. When we first started to investigate this issue, we used the following tools:
xcrun simctl and fbsimctl are command line utilities for simulator control (the first officially from Apple, the second from Facebook, it is a little more convenient to use)
WebDriverAgent, also from Facebook, in order to drive applications out of the process when push-notification comes along or something like
ideviceinstaller, to put the application on physical devices and then somehow automate it on the device.
By the time we started writing device server, we searched. It turns out that fbsimctl by that time already knew how to do everything that xcrun simctl and ideviceinstaller could do, so we just threw it away, leaving only fbsimctl and WebDriverAgent. This is already some kind of simplification. Then we thought: why do we need to write something, for sure Facebook has everything ready. Indeed, fbsimctl can work as a server. You can run it like this:
This will raise the simulator and launch a server that will listen to the commands.
When you stop the server, it automatically terminates the simulation.
What commands can I send? For example, using curl to send a list, and it will display full information about this device:
And it's all in JSON, that is, it's easy to parse from the code, and it's easy to work with. They have implemented a huge bunch of commands, which allows you to do anything with the simulator.
For example, approve is to give permission to use the camera, location and notification. The open command allows you to open deep links in the application. It would seem that you can not write anything, but take fbsimctl. But it turned out that there are not enough teams like this:
It's easy to guess that without them you can not run a new simulator. That is, someone must go to the car in advance and pick up the simulator. And most importantly: you can not run tests on the simulator. Without these commands, the server can not be used remotely, so we decided to compile a complete list of additional requirements that we need.
The first is the creation and loading of simulators on demand. That is, liveshot' can at any time ask for the iPhone X, and then the iPhone 5S, but most of the tests will run on the iPhone 6s. We must be able to create the required number of simulators of each type on demand.
We also need to somehow be able to run WebDriverAgent or other XCUI tests on simulators or physical devices in order to drive automation itself.
And we wanted to completely hide the satisfaction of the requirements. If your tests want to test something on iOS 8 for backwards compatibility, then they do not need to know which machine to follow for this device. They just request the device server iOS ? and if there is such a machine, then he will find it himself, somehow prepare and return the device from this machine. It was not in fbsimctl.
Finally, these are various additional actions like removing cookies in tests, which saves a minute in each test, and other various tricks that we'll talk about at the very end.
And the last point is a simulation of simulators. We had the idea that once the device server now lives separately from the tests, it is possible to run all the simulators in advance at the same time, and when the tests come, the simulator will be ready to start working immediately, and we will save time so. As a result, we did not do it, because downloading the simulators was already very fast. And this will also be at the very end, here are such spoilers.
The picture is just an example of the interface, we wrote some wrapper, a client to work with this remote device. Here the dots are different face-to-face methods that we just duplicated. All the rest - their own methods, for example, quick reset of the device, cleaning the cook and getting a different diagnosis.
The whole scheme looks like this: there is a Test Runner, which will run the tests and prepare the environment; there is a Device Provider, a client to the Device Server, to request the device; Remote Device is a wrapper over a remote device; Device Server is the device itself. Everything behind it is hidden from the tests, there are some background threads for cleaning the disk and performing other important actions and fbsimctl with WebDriverAgent.
How does it all work? From tests or from Test Runner, we request a device with a certain capability, for example, iPhone 6. The request goes to the Device Provider, and it forwards the device server, which finds a suitable machine, runs a background-stream on it to prepare the simulator, and immediately returns to the tests some reference , a token, a promise that in the future the device will be created and loaded. On this token, you can go to Device Server and ask for the device. This token is turned into an instance of the RemoteDevice class and you can work with it already in the tests.
All this happens almost instantly, and in the background parallel loading of the simulator with fbsimctl begins. Now we, for example, load simulators in headless mode. If anyone remembers the first picture with iron, it was possible to see many windows of simulators on it, before we did not load them in a headless fashion. They are somehow loaded, you will not even see anything. Simply wait until the simulator is fully loaded, for example, in syslog there is an entry about the SpringBoard and other heuristics to determine the simulator's readiness.
Once it's loaded, we launch XCTest, which actually raises WebDriverAgent, and we'll start asking healthCheck for it, because WebDriverAgent sometimes does not come up, especially if the system is very heavily loaded. At the same time, a cycle is started, waiting for the "ready" state of the device. This is actually the same healthCheck. Once the device is fully loaded and ready for testing, you exit the loop.
Now you can put an application on it simply by sending a request to fbsimctl. Everything is elementary here. You can also create a driver, the request is proxied to WebDriverAgent, and creates a session. After that, you can run the tests.
Tests are such a small part of the whole scheme, in them you can continue to communicate with the device server to perform actions like removing the cookie, getting a video, starting the recording and so on. In the end, we need to release the release, it ends, all resources are cleaned, the cache and the like are reset. In fact, you do not need to release the device. It's clear that the device server itself does this, because sometimes the tests fall along with the Test Runner and obviously do not free the devices. This scheme is much simplified, it does not have many items and background work that we perform so that the server can work for a whole month without any problems and reboots.
Results and Next Steps
The most interesting part is the results. They are simple. From 30 cars went to 60. These are virtual machines, not physical machines. Most importantly, we reduced the time from one and a half hours to 30 minutes. And then the question arises: if the cars became twice as large, then why did the time decrease three-fold?
In fact, everything is simple. I showed the picture of resource sharing - this is the first reason. He gave an additional boost in speed in most cases, when developers at different times started tasks.
The second point is the separation of tests and infrastructure. After that, we finally realized how everything works, and were able to optimize each of the parts separately and add a little more acceleration to the system. Separation of Concerns is a very important idea, because when everything is interwoven, it is impossible to cover the entire system.
It became easier to do updates. For example, when I first joined the company, we did an update on Xcode ? which took more than a week with that few machines. The last time we updated to Xcode 9.? and it went literally in a day and a half, and most of the time - copying files. We did not participate, it did something there.
We greatly simplified Test Runner, in which there were rsync, ssh and other logic. Now all this is thrown away and works somewhere on * nix, in Docker-containers.
The following steps: Preparing the device server for the open source (after the report it was placed on GitHub ) , and we are thinking about removing ssh, because this requires additional tuning on the machines and in most cases leads to a complication of the logic and support of the entire system. But right now you can take the device server, allow on all ssh machines, and the tests will really go to them without problems.
Tips & tricks
Now the most important thing is all sorts of tricks and just useful things that we found, creating a device server and this infrastructure.
The first is the simplest. As you remember, we had a MacBook Pro, all tests were run on laptops. Now we run them on the Mac Pro.
Here are two configurations. These are actually top versions of each of the devices. On the MacBook, we could stably run 6 simulators in parallel. If you try to load more simultaneously, the simulators start to fail because they heavily load the processor, they have reciprocal locks and stuff. On Mac Pro, you can run 18 - it's very easy to count, because instead of 4 there are 12 cores. Multiply by three - you have about 18 simulators. In fact, you can tryYou can start a little more, but you have to somehow spread it in time, you can not, for example, launch it in one minute. Although there will be a trick with these 18 simulators, not everything is so simple.
And this is their price. I do not remember how much it is in rubles, but it's clear that they cost a lot. The cost of each simulator in the MacBook Pro costs around £ 40? and the Mac Pro almost £ 330. This is already around £ 70 savings on each simulator.
In addition, these MacBooks had to be installed in a certain way, they had charging on magnets, which had to be pasted on scotch, because sometimes they fell off. And I had to buy an adapter to connect Ethernet, because so many devices in the iron box on the Wi-Fi actually do not work, it becomes unstable. The adapter also costs about £ 3? when you divide by ? then you will get another £ 5 for each device. But, if you do not need this super-parallelization, you only have 20 tests and enough 5 simulators, it's actually easier to buy a MacBook, because it can be found in any store, and the Mac Pro will have to be ordered and waiting in top-end configuration. By the way, they cost us a little cheaper, because we took them in bulk and there was some discount. Another Mac Pro can be bought with a small memory, and then upgrade yourself, saving even more.
But with the Mac Pro there is one trick. We had to break them into three virtual machines, put ESXi there. This is bare metal-virtualization, that is, a hypervisor that is put on a bare machine, not a host system. He himself is the host, so we can run three virtual machines. And if you put some ordinary virtualization on macOS, for example Parallels, then you will be able to run only 2 virtual devices due to Apple's licensing restrictions. I had to break, because in CoreSimulator, in the main service controlling simulators, there were internal locks, and simultaneously more than 6 simulators simply do not load, they start waiting for something in the queue, and the total load time of 18 simulators becomes unacceptable. By the way, ESXi costs £ ? it's always nice when something does not cost anything, it works well.
Why did not we do pooling? Partly because we accelerated the reset of the simulator. Let's say the test has fallen, you want to completely clean the simulator so that the next one does not fall due to the remaining incomprehensible files in the file system. The simplest solution is to shutdown the simulator, erase it and boot.
Very simple, one line, but it takes 18 seconds. And another half a year or a year ago it took almost a minute. Thank you Apple that they have optimized this matter, but you can do smarter. Download the simulator and copy its working directories to the daddy backup. And then you turn off the simulator, delete the working directory and copy the backup, run the simulator.
It turns out 8 seconds: the download accelerated more than twice. At the same time nothing complicated was done, that is, in Ruby-code it takes literally two lines. In the picture I give an example on the bachet so that it would be easy to translate into other languages.
Next reception. There is a Bumble app, it looks like Badoo, but with a slightly different concept, much more interesting. There it is necessary to log in through Facebook. In all our tests, since each time we use a new user from the pool, we had to log out of the previous one. To do this, we use WebDriverAgent to open Safari, go to Facebook, sting Sign out. It seems to be good, but takes almost a minute in each test. A hundred tests. One hundred extra minutes.
In addition, Facebook likes to sometimes do A /B tests, so they can change locators, text on buttons. Suddenly a batch of tests will fall, and everyone will be extremely unhappy. Therefore, we use fbsimctl to do list_apps, which finds all the applications.
We find MobileSafari:
And there there is a way to DataContainer, and in it there is a binary file with cookies:
We just delete it - it takes 20 ms. Tests began to pass 100 minutes faster, have become more stable, because they can not fall because of Facebook. So parallelization is sometimes not needed. You can find places for optimization, easily minus 100 minutes, do not do anything. In the code, these are two lines.
Next: how we prepare the host machines to run the simulators.
With the first example, many who ran Appium are familiar - this is disabling the hard keyboard. The simulator has a habit of typing text in the simulator to connect the hard keyboard on the computer, and to completely hide the virtual one. And Appium uses a virtual keyboard to enter text. Accordingly, after a local debug of the test for the input, the remaining tests may start to fail due to the lack of a keyboard. With this command, you can disable the hardware keyboard, and we do this before raising each node for tests.
The next point is more relevant for us, because the application is tied to geolocation. And very often you need to run the tests so that it was initially disabled. You can set 3101 in the LocationMode. Why? Previously there was an article in the documentation of Apple, but then they somehow deleted it. Now this is just a magic constant in the code, to which we all pray and hope that it will not break. Because as soon as it breaks down, all users will be in San Francisco, because fbsimctl puts this location when it's loaded. On the other hand, we will easily find out about it, because everyone will be in San Francisco.
The next one is the disabling of Chrome, the frames around the simulator, which has different buttons. When you run autotests, it is not needed. Previously, its shutdown allowed you to place more simulators from left to right to see how things go in parallel. Now we do not do that, because we have everything headless. How many do not go to the car, the simulators themselves will not be visible. If this is necessary, then you can stream the stream from the desired simulator.
There is also a set of different options that can be turned on and off. Of these, only SlowMotionAnimation is mentioned, because I had a very interesting second or third day at work. I ran the tests, and they all started to fall by timeout. Did not find the items in the inspector, although he was. It turned out that at this time I launched Chrome, pressed cmd + T to open a new tab. At this point, the simulator became active and intercepted the team. And for him, cmd + T is the slowdown of all animations 10 times for debugging animation. This option should also always be automatically disabled if you want to run tests on machines that people have access to, because they can accidentally break tests by slowing down the animation.
Probably, the most interesting for me, as I did not so long ago, was the management of all this infrastructure. 60 virtual hosts (in fact 64 + 6 TeamCity agents) are not manually rolled out by hand. We found the utility xcversion - Now it's part of fastlane, gem in Ruby, which can be used as a command line utility: it partially automates the installation of Xcode. Next we took Ansible, wrote playbooks to roll out the fbsimctl of the correct version everywhere, Xcode and unconfigure the configs for the device server itself. And Ansible for removing and updating simulators. When we go to iOS 1? we leave iOS 10. But when the testing team says that it completely refuses automatic testing on iOS 1? we just go through Ansible and clean up the old simulators. Otherwise, they take up a lot of disk space.
How it works? If you just take xcversion and call it on each of the 60 machines, it will take a lot of time, as it goes to the Apple website and shakes all the images. To update the machines that are in the park, you must select one working machine, run xcversion install on the desired version of Xcode, but do not install anything and do not delete anything. The installation package will be downloaded to the cache. The same can be done for any version of simulators. The installation package is put in ~ /Library /Caches /XcodeInstall. Next, you load everything with Ceph, and if it does not, run some web server in this directory. I'm used to Python, so I run a Python HTTP server on machines.
Now on any other developer machine or tester, you can do xcversion install and point the link to the raised server. It downloads from the specified machine xip (if the local network is fast, it will happen virtually instantly), unpacks the package, confirms the license - in general, does everything for you. There will be a fully working Xcode, in which it will be possible to run simulators and tests. Unfortunately, with simulators it was not so conveniently done, therefore it is necessary to do curl or wget, to swing a packet from that server to your local machine in the same directory, to run xcversion simulators --install. We put these calls inside the Ansible scripts and updated 60 machines per day. The main time was taken by network file copying. In addition, at that moment we moved, that is, part of the cars were turned off. We restarted Ansible two or three times to catch up with the cars that were absent during the crossing.
A little summing up. On the first part: it seems to me that priorities are important. That is, first of all you should have the stability and reliability of the tests, and then the speed. If you just go for speed, start to parallelize everything, then the tests will work quickly, but no one will ever look at them, just everyone will restart until everything suddenly passes. Or even scoring for tests and shoving into the master.
The next point: automation is the same development, so you can just take the patterns that have already come up for us, and use them. If your infrastructure is now closely related to tests and planned to scale, then this is a good time to first divide, and then scale.
And the last point: if the task is to speed up the tests, then the first thing in the head comes to add more simulators, so that for some factor it becomes faster. In fact, very often you do not have to add, but carefully analyze the code and optimize everything with a couple of lines, as in the example with cookies. This is better than parallelization, because two lines of code saved 100 minutes, and for parallelization you will have to write a lot of code and then maintain the iron part of the infrastructure. For money and resources, this will be much more expensive.
Those who are interested in this report from the Heisenbug conference may also be interested in the following Heisenbug : it will be held in Moscow on December 6-? and at site The conference already has descriptions of a number of reports (and, incidentally, 3 r3r368? the acceptance of applications
, the reports are still open).
It may be interesting