Introduction to the task of recognizing emotions

Recognition of emotions is a hot topic in the field of artificial intelligence. The most interesting areas of application of such technologies include: driver recognition, marketing research, video analytics systems for smart cities, human-machine interaction, monitoring of students passing online courses, portable devices, etc.
 
This year the MDG company devoted its to this topic. year school on machine learning . In this article I will try to give a short digression into the problem of recognizing the emotional state of a person and I will also describe approaches to solving it.
 
Introduction to the task of recognizing emotions
Paul Ekman , investigating the non-verbal behavior of isolated tribes in Papua New Guinea in the 1970s, found that a number of emotions, namely: anger, fear, sadness, disgust, contempt, surprise and joy are universal and can be understood by a person, independently from his culture.
 
People are able to express a wide range of emotions. It is believed that they can be described as a combination of basic emotions (for example, nostalgia is something between sadness and joy). But such a categorical approach is not always convenient, because does not allow to quantify the power of emotion. Therefore, along with discrete models of emotions, a series of continuous ones was developed. In Russell's model there is a two-dimensional basis in which each emotion is characterized by a sign (valence) and intensity (arousal). In view of its simplicity, Russell's model has become increasingly popular in the context of the problem of automatic classification of facial expressions.
 

 
So, we found out that if you do not try to hide emotional excitement, then your current state can be assessed by facial expressions. Moreover, using modern achievements in the field of deep learning, it is even possible to build a lie detector, based on the series "Lie to me", the scientific basis of which was directly the work of Paul Ekman. However, this task is not so simple. As shown by Research neuroscientist Lisa Feldman Barrett, when recognizing emotions, people actively use contextual information: voice, actions, situation. Take a look at the pictures below, it really is. Using only the face area, the correct prediction can not be done. In this regard, to solve this problem, it is necessary to use both additional modalities and information on the change of signals over time.
 

 
Here we will consider approaches to the analysis of only two modalities: audio and video, since these signals can be obtained in a non-contact way. To get up to the task, you first need to get the data. Here is a list of the largest publicly available databases of emotions known to me. Images and videos in these databases were manually marked, some using Amazon Mechanical Turk.
 
 
 
 
Title
 
The data are
 
Layout
 
The year of release
 
 
 
 
 
OMG-Emotion challenge
 
Audio /video
 
6 categories, valence /arousal
 
2018
 
 
 
EmotiW challenge
 
Audio /video
 
6 categories
 
2018
 
 
 
AffectNet
 
images
 
7 categories, valence /arousal
 
2017
 
 
 
AFEW-VA
 
video
 
valence /arousal
 
2017
 
 
 
EmotioNet challenge
 
images
 
16 categories
 
2017
 
 
 
EmoReact
 
Audio /video
 
17 categories
 
2016
 
 
 
 
Classical approach to the problem of classifying emotions.
 
The easiest way to determine the emotion of a face image is based on the classification of key points (facial landmarks), the coordinates of which can be obtained using various algorithms PDM , CML , AAM , DPM or CNN . Usually mark 5 to 68 points, tying them to the position of the eyebrows, eyes, lips, nose, jaw, which allows you to partially capture facial expressions. The normalized coordinates of the points can be directly fed into the classifier (for example, SVM or Random Forest) and get a basic solution. Of course, the position of the persons must be aligned.
 

 
The simple use of coordinates without a visual component leads to a significant loss of useful information, therefore various descriptors are calculated at these points to improve the system: LBP , HOG , SIFT , LATCH etc. After concatenating the descriptors and reducing the dimension using the PCA, the obtained feature vector can be used to classify emotions.
 

 
reference to article
 
However, this approach is already considered obsolete, since it is known that deep convolutional networks are the best choice for analyzing visual data.
 
Classification of emotions with the use of deep learning
 
In order to build a neural network classifier, it is enough to take some network with a basic architecture preliminarily trained on ImageNet, and retrain the last few layers. So you can get a good basic solution for classifying different data, but given the specific nature of the problem, the neural networks used for large-scale tasks will be more suitable. Face recognition .
 
So, to build a classifier of emotions for individual images is simple enough, but as we found out, instant snapshots do not quite accurately reflect the true emotions that a person experiences in this situation. Therefore, to increase the accuracy of the system, it is necessary to analyze the sequences of frames. This can be done in two ways. The first method is to feed the high-level characteristics obtained from the CNN that classifies each individual frame into a recurrent network (for example, LSTM) to capture the time component.
 

 
reference to article
 
The second way is to directly feed the sequence of frames taken from the video with some step to the 3D-CNN input. Similar CNNs use convolutions with three degrees of freedom, converting the four-dimensional input into three-dimensional feature cards.
 

 
reference to article
 
In fact, in general, these two approaches can be combined by constructing such a monster.
 

 
reference to article
 
Classification of emotions by speech
 
On the basis of visual data, it is possible to predict the sign of emotion with high accuracy, but in determining the intensity, it is preferable to use speech signals . Analyzing audio is a little more difficult due to the strong variation in the length of speech and the voices of the announcers. Usually, not the original sound wave, but various sets of are used for this. signs , for example: F? MFCC, LPC, i-vector, etc. In the task of recognizing emotions by speech, the open library has proved to be a good one. OpenSMILE , containing a rich set of algorithms for analyzing speech and music signals. After extraction, the characteristics can be submitted to SVM or LSTM for classification.
 
However, recently convolutional neural networks began to penetrate into the area of ​​sound analysis, replacing the established approaches. In order to apply them, the sound is represented in the form of spectrograms in a linear or mel-scale, after which the obtained spectrograms are operated as with ordinary two-dimensional images. The problem of an arbitrary size of the spectrograms along the time axis is elegantly solved with the help of statistical pooling or by including a recurrent network in the architecture.
 

 
reference to article
 
Audiovisual recognition of emotions
 
So, we considered a number of approaches to the analysis of audio and video modalities, the final stage was left - the combination of classifiers for the final solution. The simplest way is to directly combine their estimates. In this case, it is sufficient to take the maximum or average. A more complex option is to combine at the embedding level for each modality. SVM is often used for this, but this is not always correct, so embeddings can have different norms. In this regard, more advanced algorithms have been developed, for example: Multiple Kernel Learning and ModDrop .
 
And of course it is worth mentioning the class of so-called end-to-end solutions that can be learned directly from raw data from several sensors without any preliminary processing.
 
In general, the task of automatic recognition of emotions is still far from the solution. Judging by the results of last year's Emotion Recognition in the Wild, the best solutions are reach an accuracy of about 60%. I hope that the information presented in this article will be enough to try to build your own system of emotion recognition.
+ 0 -

Comments 1

Offline
get robux
get robux 25 July 2018 12:43
To get robux online for the roblox robux game then the best option is to use free robux generator for the roblox free robux game.

Add comment