Malicious machine learning as a diagnostic method

Malicious machine learning as a diagnostic method 3r33333.  
3r33333.  
Hello to all! 3r33333.  
3r33333.  
Continuing the development of the theme deep. learning , we once wanted to talk to you about, why neural networks see sheep everywhere . This topic is considered 3r37373. in the 9th chapter of the book by Francois Chollet. 3r33333.  
3r33333.  
So we went to the wonderful research of the company "Positive Technologies", 3r-325. presented on Habré
, as well as the excellent work of two MIT employees, who believe that “harmful machine learning” is not only a hindrance and a problem, but also an excellent diagnostic tool. 3r33333.  
3r33333.  
Then - under the cut
 
3r3333.
3r33333.  
3r33333.  
Over the past few years, cases of malicious intervention have attracted considerable attention in the deep learning community. In this article, we would like to review this phenomenon in general terms and discuss how it fits into the broader context of machine learning reliability. 3r33333.  
3r33333.  
3r33311. Malicious interventions: The intriguing phenomenon of 3r33312. 3r33333.  
3r33333.  
To delineate the space of our discussion, we give some examples of such malicious interference. We think that the majority of researchers working in the MoD came across similar images: 3r33333.  
3r33333.  
3r33333.  
3r33333.  
On the left is a piglet, correctly classified by the modern convolutional neural network as a piglet. We only need to make minimal changes to the picture (all pixels are in the[0, 1]range, and each changes by no more than ???) - and now the network with high confidence returns the class “airliner”. Such attacks on trained classifiers have been known since at least 2004 (3-33358. Ref. 3-33273.), And the first works related to malicious interference with image classifiers relate to 2006 (3-3360. Ref. 3-33273.). Then this phenomenon began to attract much more attention from about 201? when it turned out that neural networks were vulnerable to attacks of this kind (see 3r362. Here, 3r32737. And 3r3133. Here, 3r33273.). Since then, many researchers have offered options for constructing malicious examples, as well as ways to increase the sustainability of classifiers against such pathological disturbances. 3r33333.  
3r33333.  
However, it is important to remember that it is not necessary to delve into neural networks in order to observe such malicious examples. 3r33333.  
3r33333.  
3r33311. How resistant are malicious examples? 3r33312. 3r33333.  
3r33333.  
Perhaps a situation in which a computer confuses a piglet with an airliner can be alarming at first. However, it should be noted that the classifier used in this case ( Network Inception-v3 ) Is not as fragile as it may seem at first glance. Although the network is surely mistaken when trying to classify a distorted piglet, this happens only in the case of specially selected violations. 3r33333.  
The network is much more resistant to random perturbations of comparable magnitude. Therefore, the main question is precisely whether malicious disturbances cause the fragility of networks. If the damage per se is critically dependent on the control over each input pixel, then when classifying images in realistic conditions such malicious samples do not seem to be a serious problem. 3r33333.  
3r33333.  
Recent studies suggest otherwise: it is possible to ensure the stability of hail to various channel effects in specific physical scenarios. For example, malicious samples can be printed on an ordinary office printer, so that the images on them, photographed by the camera of the smartphone, still classified incorrectly . You can also make stickers for which neural networks incorrectly classify various real scenes (see, for example, 3r390. Reference1 3r37373., 3r3392. Link2 3r33232. And 3r3394. Link3 3r37373.). Finally, researchers recently published a bug on a 3D printer, which the standard Inception network mistakenly from almost any viewing angle. considers rifle 3r33232. . 3r33333.  
3r33333.  
3r33311. Preparing attacks that provoke an erroneous classification 3r33312. 3r33333.  
3r33333.  
How to create such malicious perturbations? There are many approaches, but optimization makes it possible to reduce all these different methods to a generalized representation. As you know, classifier training is often formulated as finding the parameters of a model  
3r33333.  
3r3114. 3r33333.  
3r33333.  
Therefore, in order to provoke an erroneous classification for a fixed model , such that the loss at  
3r33333.  
3r33333.  
3r33333.  
If we proceed from this formulation, many methods for creating malicious input can be considered as different optimization algorithms (separate gradient steps, projected gradient descent, etc.) for different sets of constraints (small 3r-3132. - normal disturbance, small pixel changes, etc.). ). A number of examples are given in the following articles: 3r3133. link1
, 3r3135. link2
, link3 , 3r3153. link4
and 3r3141. link5
. 3r33333.  
3r33333.  
As explained above, many successful methods for generating malicious samples work with a fixed target classifier. Therefore, the important question is: do these disturbances affect only the specific target model? What is interesting is not. When applying many perturbation methods, the resulting malicious samples are transferred from the classifier to the classifier, trained with a different set of initial random values ​​(random seeds) or different model architectures. Moreover, it is possible to create malicious samples that have only limited access to the target model (sometimes in this case they speak of “attacks on the black box principle”). See, for example, the following five articles: 3r3147. link1
, 3r3149. link2
, 3r3151. link3
, 3r3153. link4
and 3r3155. link5
. 3r33333.  
3r33333.  
3r33311. Not only pictures [/b] 3r33333.  
3r33333.  
Malicious patterns are found not only in the classification of images. Similar phenomena are known at speech recognition r3r3273. in
question-answer systems
at 3r3171. training with reinforcements
and solving other problems. As you already know, the study of malicious samples lasts more than ten years: 3r33333.  
3r33333.  
3r33177. 3r33333.  
3r33333.  
Chronological scale of malicious machine learning (start). Full scale is shown in Fig. 6 in 3r3182. this study
. 3r33333.  
3r33333.  
In addition, security-related applications are the natural environment for studying the harmful aspects of machine learning. If an attacker could trick the classifier and pass malicious input (say, spam or virus) harmless, then the spam detector or antivirus scanner that works on machine-based learning will turn out to be 3r3188. ineffective
. It should be emphasized that these considerations are not purely academic. For example, the Google Safebrowsing team back in 2011 published perennial research how intruders tried to bypass their malware detection systems. Also see this article about malicious samples in the context of spam filtering in GMail mail. 3r33333.  
3r33333.  
3r33311. Not only safety [/b] 3r33333.  
3r33333.  
All the latest work on the study of malicious samples is clearly defined in terms of security. This is a valid point of view, but we believe that such samples should be considered in a wider context. 3r33333.  
3r33333.  
3r33311. Reliability [/b] 3r33333.  
3r33333.  
First of all, malicious samples raise the question of the reliability of the entire system. Before we can reasonably argue about the properties of a classifier from a security point of view, we need to make sure that the mechanism provides good classification accuracy well. After all, if we are going to deploy our trained models in real-world scenarios, then they need to demonstrate a high degree of reliability when changing the distribution of basic data - regardless of whether these changes are due to malicious interference or only natural fluctuations. 3r33333.  
3r33333.  
In this context, malicious samples are a useful diagnostic tool for assessing the reliability of machine learning systems. In particular, the approach with the consideration of malicious samples allows one to go beyond the standard evaluation protocol, where a trained classifier is run through a carefully selected (and usually static) test suite. 3r33333.  
3r33333.  
So you can come to amazing conclusions. For example, it turns out that you can easily create malicious samples without even resorting to sophisticated optimization methods. In recent work we show that cutting-edge image classifiers are surprisingly vulnerable to small pathological transitions or turns. (See 3r33226. Here is 3r33273. And 3r33232. Here is 3r-3273. Other works on this topic.) 3r33315.  
3r33333.  
3r33333.  
3r33333.  
Therefore, even if one does not attach importance to, say, perturbations from the discharge of ℓ∞ℓ∞, reliability problems due to rotations and transitions often arise all the same. In a broader sense, it is necessary to understand the reliability indicators of our classifiers before they can be integrated into larger systems as truly reliable components. 3r33333.  
3r33333.  
3r33311. The concept of classifiers [/b] 3r33333.  
3r33333.  
To understand how a trained classifier works, it is necessary to find examples of its clearly successful or unsuccessful operations. In this case, the malicious samples illustrate that trained neural networks often do not match our intuitive understanding of what it means to “learn” a particular concept. This is especially important in deep learning, where they often claim biologically plausible algorithms and networks whose success is not inferior to human (see, for example, 3–3–324? here 3–3–327? 3–3–3251. Here, 3–3–3273. Or 3–3–3253. Malicious patterns clearly make one doubt in this immediately in a variety of contexts: 3r33333.  
3r33333.  
 
When classifying images, if the set of pixels is minimally changed or the picture is rotated a little, it hardly hurts a person to put it in the correct category. Nevertheless, such changes completely cut down the most modern classifiers. If you place objects in an unusual place (for example, 3r-?262. Sheep on a tree, 3r-?273.) It is also easy to make sure that the neural network does not interpret the scene as much as a person.
 
If you substitute the right words in the text excerpt, you can seriously confuse question-answer system Although, from the point of view of a person, the meaning of the text will not change due to such inserts.
 
In this article carefully selected text examples show the limits of Google Translate.
 
3r33333.  
3r33333.  
In all three cases, malicious examples help to test the strength of our modern models and underline the situations in which these models act completely differently from what a person would do. 3r33333.  
3r33333.  
3r33311. Security [/b] 3r33333.  
3r33333.  
Finally, malicious samples really represent a danger in those areas where machine learning already reaches a certain accuracy on "harmless" material. Just a few years ago, such tasks as image classification were still very badly performed, so the security problem in this case seemed secondary. In the end, the degree of security of a machine-learning system becomes essential only when this system begins to process “harmless” input sufficiently qualitatively. Otherwise, we still can not trust her predictions. 3r33333.  
3r33333.  
Now, the accuracy of such classifiers has significantly improved in various subject areas, and deploying them in situations where security considerations are critical is just a matter of time. If we want to approach this responsibly, then it is important to investigate their properties precisely in the security context. But the security issue needs a holistic approach. It is much easier to fake certain attributes (for example, a set of pixels) than, for example, other sensory modalities, or categorical attributes, or metadata. In the end, while providing security, it is best to rely on precisely such signs that are difficult or even almost impossible to change. 3r33333.  
3r33333.  
3r33311. Results (too early to sum up?) 3r33312. 3r33333.  
3r33333.  
Despite the impressive progress in machine learning, which we have seen in recent years, it is necessary to take into account the limits of the capabilities of the tools that we have at our disposal. There are a wide variety of problems (for example, related to honesty, privacy, or feedback effects), and reliability is of utmost concern. Human perception and cognition are resistant to various background environmental disturbances. However, malicious samples demonstrate that neural networks are still very far from comparable sustainability. 3r33333.  
3r33333.  
So, we are convinced of the importance of studying malicious examples. Their applicability in machine learning is far from being limited to safety issues, but can serve as a result. diagnostic standard [/b] to evaluate trained models. The approach with the use of malicious samples compares favorably with standard evaluation procedures and static tests in that it reveals potentially unobvious defects. If we want to understand the reliability of modern machine learning, it is important to investigate the latest achievements from the point of view of an attacker (having correctly picked up malicious samples). 3r33333.  
3r33333.  
As long as our classifiers fail even with minimal changes between the training and test distribution, we will not be able to achieve satisfactory guaranteed reliability. In the end, we strive to create such models that are not only reliable, but will also be consistent with our intuitive ideas about what it means to “study” a task. Then they will be safe, reliable and convenient to deploy in a variety of environments.
3r33320. ! function (e) {function t (t, n) {if (! (n in e)) {for(var r, a = e.document, i = a.scripts, o = i.length; o -;) if (-1! == i[o].src.indexOf (t)) {r = i[o]. ; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () ();
+ 0 -

Add comment