Overview of Deep Domain Adaptation Basic Methods (Part 2)

 3r3-3559. 3r3-31. 3r33546. In the first part, we learned about the methods of domain adaptation through deep learning. We talked about the basic datasets, as well as discrepancy-based and adversarial-based non-generative approaches. These methods show themselves well for some tasks. And this time we will analyze the most complex and promising adversarial-based methods: generative models, as well as algorithms that show the best results on VisDA data (adaptations from synthetic data to real photos). 3r33547. 3r3544.  3r3-3559. Overview of Deep Domain Adaptation Basic Methods (Part 2) 3r33538. 3r3544.  3r3-3559. 3r311. Generative Models
3r3544.  3r3-3559. 3r33546. This approach is based on GAN’s ability to generate data from the required distribution. Thanks to this property, you can get the right amount of synthetic data and use them for training. The main idea of ​​the methods from the generative models family is to generate data that is as similar as the representatives of the target domain using the source domain. Thus, new synthetic data will have the same labels as the representatives of the original domain, on the basis of which they were obtained. The model for the target domain is then simply trained on this generated data. 3r33547. 3r3544.  3r3-3559. 3r33546.
Presented at ICML-2018 method CyCADA: Cycle-Consistent Adversarial Domain Adaptation 3r33473. (3r-324. Code 3r-?338.) Is a characteristic representative of the family of generative models. It combines several successful approaches from the GANs and domain adaptation. Its important part is the use of cycle-consistency loss, first presented in the article on CycleGAN . The idea of ​​cycle-consistency loss is that the image obtained by generating from the source to the target domain, followed by the inverse transformation, should be close to the initial image. In addition, CyCADA includes pixel-level and vector-level adaptation, as well as semantic loss to preserve the structure in the generated image. 3r33547. 3r3544.  3r3-3559. 3r33546. Let
3r3111.
and 3r33429.
- networks for the target and source domains, respectively, 3r343429.
and 3r33429. 3r3342.
- target and source domains,

- markup on the source domain, 3r3343429. T} $ "data-tex =" inline "> And
S} $" data-tex = "inline"> - generators from the source to the target domain and vice versa, 3r343429.
and 3r33429.
- discriminators belonging to the target and source domains, respectively. Then the loss function, which is minimized in CyCADA, is the sum of the six loss functions (below is the training scheme with loss numbers): 3r3547. 3r3544.  3r3-3559. 3r362.  3r3-3559. 3r33524. 3r33434. 3r366. T} (X_S), Y_S) $ "data-tex =" inline "> - model classification 3r33434. 3r3111. 3r3433. On generated data and pseudo-labels from the source domain.
 3r3-3559. 3r33524. 3r33434. 3r375. T}, D_T, X_T, X_S) $ "data-tex =" inline "> 3r34331. - adversarial-loss for training the generator 3r33429. T} $" data-tex = "inline"> 3r3431. . 3r33535.  3r3-3559. 3r33524. 3r33434. 3r3384. S}, D_S, X_S, X_T) $ "data-tex =" inline "> 3r343431. - adversarial-loss for training the generator 3r33329. 3r3-3322. S} $" data-tex = "inline"> 3r3431. . 3r33535.  3r3-3559. 3r33524. 3r33434. 3r33939. T}, G_ {T-> S}, X_S, X_T) $ "data-tex =" inline "> (Cycle-consistency loss) -
3r34331. -loss, ensuring that images obtained from
T} $ "data-tex =" inline "> and 3r33429. 3r33132. S} $ "data-tex =" inline "> Will be close. 3r3-33527.  3r3-3559. 3r33524. 3r33434. 3r3108. T} (X_S)), X_T) $ "data-tex =" inline "> 3r3431. - adversarial-loss for vector representations
3r3111.
And
what is used in ADDA. 3r33527.  3r3-3559. 3r33524. 3r33434. T}, G_ {T-> S}, X_S, X_T, f_S) $ "data-tex =" inline "> 3r3431. (Semantic consistency loss) - 3r33434. 3r3123. 3r3431. Loss, responsible for the fact that

Will work in a similar way as on images obtained from
T} $ "data-tex =" inline "> , and from
3r33132. S} $ "data-tex =" inline "> .3r33527.  3r3-3559.
3r3544.  3r3-3559. MNIST: 95.7%. 3r33535.  3r3-3559. 3r33524. On the segmentation task of GTA 5 -> Cityscapes: Mean IoU = 39.5%. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546.
In the framework of the approach 3r3-3159. Generate To Adapt: ​​Aligning Domains using Generative Adversarial Networks
3r33473. ( Code 3r33538.) Teach such a generator 3r33429.
, so that at the output he would produce images close to the source domain. This 3r33434.
allows you to convert data from the target domain and apply to them a classifier trained on the marked data of the source domain. 3r33547. 3r3544.  3r3-3559. 3r33546. To train such a generator, the authors use a modified
discriminator.
from article 3r37777. AC-GAN
. The peculiarity of this

lies in the fact that it not only answers 1 if it came to the input data from the source domain, and 0 otherwise, but also, in the case of a positive answer, classifies the input data into classes of the original domain. 3r33547. 3r3544.  3r3-3559. 3r33546. Denote

as a convolutional network that produces a vector representation of the image,

- a classifier that works on a vector derived from 3r33429.
. Schemes of learning and algorithm inference: 3r33547. 3r3544.  3r3-3559. 3r3198. 3r3544.  3r3-3559. 3r33546. The learning procedure consists of several components: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. Discriminator

Learning to define a domain for all received from 3r33429.
data, and for the original domain is still added classification loss, as described above. 3r33535.  3r3-3559. 3r33524. On the data from the source domain

using a combination of adversarial-loss and classification loss, the student is trained to generate a result similar to a domain source and properly classified 3r33429.
. 3r33535.  3r3-3559. 3r33524. 3r33434.
and 3r33429.
Learn to classify data from the source domain. Also

with the help of another classification loss, it is modified so as to increase the quality of the classification 3r-33429.
. 3r33535.  3r3-3559. 3r33524. With the help of adversarial-loss

learns to cheat

on data from the target domain. 3r33535.  3r3-3559. 3r33524. The authors empirically deduced that before serving in 3r33429.
It makes sense to concate a vector from

with normal noise and one-hot class vector (3r33434. 3r3-33257.
for target data). 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. The results of the method on benchmarks: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. On digital domains USPS -> MNIST: 90.8%. 3r33535.  3r3-3559. 3r33524. On dataset Office, the average quality of adaptation for pairs of Amazon and Webcam domains: 86.5%. 3r33535.  3r3-3559. 3r33524. The average value of quality in 12 categories without a unknown class: 76.7%. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546.
The article 3r33284. From source to target and back: symmetric bi-directional adaptive GAN
3r33473. ( Code 3r33538.) The SBADA-GAN model was presented, which is quite similar to CyCADA and the objective function of which, just like in CyCADA, consists of 6 terms. In the designations of the authors 3-343429.
and 3r33429.
- generators from the source domain to the target and vice versa, 3r343429.
and 3r33429. 3r33232.
- discriminators that distinguish real data from those generated in the source and target domains, respectively, 3r343429. 3r3302.
and 3r33429. 3r3305.
- classifiers that are trained on data from the source domain and on their versions transformed into the target domain. 3r33547. 3r3544.  3r3-3559. 3r33546. SBADA-GAN, like CyCADA, uses the idea from CycleGAN, consistency loss and pseudo-labels for the data generated in the target domain, making up the objective function from the corresponding components. The features of SBADA-GAN include: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. The image + noise is fed to the generator input. 3r33535.  3r3-3559. 3r33524. The test uses a linear combination of the predictions of the target-model and the source-model on the result of the transformation 3r33429.
. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. SBADA-GAN:
training scheme. 3r3544.  3r3-3559. 3r33333. 3r3544.  3r3-3559. 3r33546. The authors of SBADA-GAN conducted more experiments than the authors of CyCADA, and obtained the following results: 3r3547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. On USPS -> MNIST domains: 95.0%. 3r33535.  3r3-3559. 3r33524. On MNIST -> SVHN domains: 61.1%. 3r33535.  3r3-3559. 3r33524. On Synth Signs -> GTSRB road signs: 97.7%. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. From the family of generative models, it makes sense to consider more important articles: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524.
Unsupervised Cross-Domain Image Generation
; 3r33535.  3r3-3559. 3r33524. 3r33333. Unsupervised Image-to-Image Translation Networks
3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559.
Visual Domain Adaptation Challenge
3r3544.  3r3-3559. 3r33546. As part of the workshop at the ECCV and ICCV conferences, a competition on domain adaptation 3–3–3377 is held. Visual Domain Adaptation Challenge
. In it, participants are invited to train a classifier on synthetic data and adapt it to unpartitioned data from ImageNet. 3r33547. 3r3544.  3r3-3559. 3r33546.
The algorithm introduced in Self-ensembling for visual domain adaptation 3r33473. ( Code 3r33538.), Won in VisDA-2017. This method is built on the idea of ​​self-ensembling: there is a network teacher (teacher model) and a network student (student model). At each iteration, the input image is run through both these networks. A student learns using the sum of classification loss and consistency loss, where classification loss is the usual cross-entropy with a well-known class mark, and consistency loss is the average square of the difference between teacher and student predictions (squared difference). The weights of the teacher network are calculated as the exponential moving average of the weights of the student network. Below is illustrated this learning procedure. 3r33547. 3r3544.  3r3-3559. labeled with classes
3r3406.
and data from the target domain
3r3409.
no tags. 3r33535.  3r3-3559. 3r33524. Before input to the neural networks, various strong augmentations are applied to the input images: Gaussian noises, affine transformations, etc. 3r33535.  3r3-3559. 3r33524. Both networks used strong regularization methods (for example, dropout). 3r33535.  3r3-3559. 3r33524. 3r33434. 3r33427.
- exit network-student,
3r33430.
- teacher networks. If the input was from the target domain, then only the consistency loss between 3r3343429 is calculated. 3r33427.
and 3r33429. 3r33430.
, cross-entropy loss = 0.
 3r3-3559. 3r33524. For confidence of learning, confidence thresholding is applied: if the teacher's prediction is less than the threshold (0.9), then consistency loss loss = 0.
 3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. The scheme of the procedure described: 3r33547. 3r3544.  3r3-3559. 3r3444. 3r3544.  3r3-3559. 3r33546. The algorithm has achieved high performance on the main data sets. True, for each task, the authors separately selected a set of augmentations. 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. USPS -> MNIST: ???%. 3r33535.  3r3-3559. 3r33524. MNIST -> SVHN: 97.0%. 3r33535.  3r3-3559. 3r33524. Synth Numbers -> SVHN: ???%. 3r33535.  3r3-3559. 3r33524. On Synth Signs -> GTSRB road signs: ???%. 3r33535.  3r3-3559. 3r33524. The average value of quality in 12 categories without class Unknown: 92.8%. It is important to note that this result was obtained using an ensemble of 5 models and using test time augmentation. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546.
Competition VisDA-2018
held this year as part of a conferenceECCV-2018. This time they added the 13th class: Unknown, where everything that did not fall into 12 classes fell into. In addition, there was a separate competition for the detection of objects belonging to these 12 classes. In both nominations, the Chinese team won
JD AI Research 3r33538. . At the classification competition they achieved a result of 92.3% (the average value of quality in 13 categories). Publications with a detailed description of their method yet, there is only
presentation from workshop 3r3538. . 3r33547. 3r3544.  3r3-3559. 3r33546. Of the features of their algorithm, it can be noted: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. Use of pseudo-labels for data from the target domain and additional learning of the classifier on them along with the data from the source domain. 3r33535.  3r3-3559. 3r33524. Using the SE-ResNeXt-101 convolutional network, AM-Softmax and Noise-adaption loss, Generalized cross entropy loss layers for data from the target domain. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. The scheme of the algorithm from the presentation: 3r33547. 3r3544.  3r3-3559. 3r33546. 3r301501. 3r33547. 3r3544.  3r3-3559.
Conclusion 3r3506. 3r3544.  3r3-3559. 3r33546. For the most part, we discussed adaptation methods built on an adversarial-based approach. However, in the last two contests, VisDA defeated algorithms that are not related to it and use pseudo-label training and modifications of more classical deep learning methods. In my opinion, this is due to the fact that the methods based on GANs are still only at the beginning of their development and are extremely unstable. But every year we get more and more new results that improve the work of GANs. In addition, the focus of interest of the scientific community in the field of domain adaptation is mainly focused on adversarial-based methods, and new articles mainly investigate this approach. Therefore, it is likely that the algorithms associated with the GANs will gradually come to the forefront in adaptation issues. 3r33547. 3r3544.  3r3-3559. 3r33546. But research in non-adversarial-based approaches also continues. Here are some interesting articles from this area: 3r33547. 3r3544.  3r3-3559. 3r? 3517.  3r3-3559. 3r33524. 3r33520. Associative Domain Adaptation
; 3r33535.  3r3-3559. 3r33524. 3r33525. Asymmetric Tri-training for Unsupervised Domain Adaptation
. 3r33535.  3r3-3559. 3r? 3529. 3r3544.  3r3-3559. 3r33546. Discrepancy-based methods can be classified as “historical”, but many of the ideas are used in the newest methods: MMD, pseudo-labels, metric-learning, etc. In addition, sometimes in simple adaptation tasks it makes sense to apply these methods because of their relative ease of learning and better interpretability of the results. 3r33547. 3r3544.  3r3-3559. 3r33546. In conclusion, I want to note that the methods of domain adaptation are still looking for their application in applied areas, but the promising tasks requiring the use of adaptation are gradually becoming more and more. For example, domain adaptation is actively used in 3r33537. learning autonomous car modules 3r33538. : since it is expensive and time-consuming to type real data on city streets for autopilot training, autonomous cars use synthetic data (SYNTHIA and GTA 5 are examples of them), in particular. to solve the segmentation problem of what the camera “sees” from the car. 3r33547. 3r3544.  3r3-3559. 3r33546. Getting high-quality models based on in-depth training in Computer Vision rests in many ways on the presence of large marked datasets for training. Markup almost always requires a lot of time and money, which significantly increases the development cycle of models and, as a result, products based on them. 3r33547. 3r3544.  3r3-3559. 3r33546. Domain adaptation methods are aimed at solving this problem and can potentially contribute to a breakthrough in many applied problems and in artificial intelligence in general. Transferring knowledge from one domain to another is indeed a difficult and interesting task, which is currently being actively investigated. If you suffer from a lack of data in your tasks, and can emulate data or find similar domains, I recommend trying the methods of domain adaptations! 3r33547. 3r33555. 3r3-3559. 3r3-3559. 3r3-3559. 3r???. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r33553. 3r3-3559. 3r33555. 3r3-3559. 3r3-3559. 3r3-3559. 3r3-3559.
+ 0 -

Add comment