Aytrekings, emotions and VR: technology convergence and actual research

Virtual Reality, Emotion Recognition and Aytracking - three independently developing fields of knowledge and new technology markets that are attractive from a commercial point of view are increasingly being considered in the recent years in the focus of convergence, fusion, synthesis of approaches to create products of a new generation. And in this natural process of rapprochement there is hardly anything surprising: besides the results, which can be talked about with caution, but also with considerable user enthusiasm (by the way, the recent film , "The first player to prepare" .Steven Spielberg in the literal sense visualizes many expected scenarios). Let's discuss in more detail.
 
Aytrekings, emotions and VR: technology convergence and actual research
 
Tobii joined the collaboration with Qualcomm , as detailed in reported , developer of virtual reality systems Oculus (owned by Facebook) teamed up with a well-known low cost starters startup. The Eye Tribe (a demonstration of the Oculus work with the built-in iTracker can be viewed here ), Apple in the traditionally secret mode purchased a large German tracking company SMI (although, until no one knows the motive and what exactly was the choice ).
 
 
Integration of the function of the game controller into virtual reality allows the user to control the objects with a look (in the simplest case, to grab something or select an item from the menu, you just have to look at this object for just a few more seconds - the threshold method of fixation duration is applied). However, this gives rise to the so-called. The Midas-Touch Problem, when the user only wants to carefully examine the object, and eventually involuntarily activates some programmed function that he did not want to use initially.
 
 
Virtual King Midas
 
 
Scientists from the University of Hong Kong (Pi & Shi, 2017) to solve the Midas problem suggest using a dynamic threshold for the duration of fixations, applying a probabilistic model. This model takes into account the previous choices of the subject and calculates the probability of selection for the next object, while for objects with high probability a shorter threshold duration of fixations is established than for objects with low probability. The proposed algorithm demonstrated its advantages in terms of speed in comparison with the fixed-threshold method of fixation duration. It should be noted that the mentioned system was designed for typing texts with the help of a glance, that is, it can be applied only in interfaces where the number of objects does not change over time.
 
 
A group of developers from the Munich Technical University (Schenk, Dreiser, Rigoll, & Dorr, 2017) proposed a system for interacting with computer interfaces with the help of the GazeEverywhere view, which included the development of SPOCK (Schenk, Tiefenbacher, Rigoll, & Dorr, 2016) to solve the problem Midas. SPOCK is a method of selecting an object, consisting of two steps: after fixing on an object that exceeds the established threshold, two circles appear on top and bottom of the object, slowly moving in different directions: the subject's task is to trace the motion of one of them the detection of a slow tracking eye movement just activates the selected object). Also, the GazeEverywhere system is curious in that it includes an online recalibration algorithm, which greatly improves the accuracy of localization of the gaze on the site.
 
 
As an alternative to the threshold method of fixation duration for object selection and for solving the Midas problem, a group of scientists from Weimar (Huckauf, Goettel, Heinbockel, & Urbina, 2005) consider it correct to use anti-saccades as an action of choice (anti-saccades are the making of eye movements in the opposite direction from the goal). This method facilitates a faster selection of the object than the method of the threshold for the duration of fixations, but it has less accuracy. Also, its use outside the laboratory raises questions: after all, looking at the right object for a person is more natural behavior than not specifically looking in his direction.
 
 
Wu et al. (Wu, Wang, Lin, & Zhou, 2017) developed an even simpler method for solving the Midas problem - as a marker for selecting objects, they suggested using a wink with one eye (ie an explicit pattern: one eye is open and the other is closed) . However, in a dynamic gameplay, the integration of such a method of selecting objects is hardly possible due to its discomfort, at least.
 
 
Pfeuffer et al. (Pfeuffer, Mayer, Mardanbegi, & Gellersen, 2017) propose to move away from the idea of ​​creating an interface, where the choice and manipulation with the object are carried out only through the movements of the eyes, and instead use the Gaze + pinch interaction technique. This method is based on the fact that the choice of the object is due to the localization of the gaze on it, combined with the gesture of folded fingers into a "pinch" (both one hand and both hands). Gaze + pinch interaction technique has advantages both in comparison with "virtual hands", because it allows you to manipulate virtual objects at a distance, and with controllers, because it frees the user's hands and creates an option to expand the functionality by adding other gestures.
 
 
Emotions in VR
 
 
For successful virtual communication, social networks and multiplayer games in VR, first of all, user avatars should be capable of natural expression of emotions. Now we can distinguish two types of solutions for adding emotionality to VR: installation of additional devices and purely software solutions.
 
 
So, the project MindMaze Mask offers a solution in the form of sensors that measure the electrical activity of the muscles of the face located inside the helmet. Also, its creators demonstrated an algorithm that, based on the current electrical activity of the muscles, is able to predict facial expressions for a qualitative avatar drawing. Developed by FACEteq ( ? emotion sensing in VR
) Is built on similar principles.
 
Startup Veeso supplemented the virtual reality glasses with another camera that records the movements of the lower half of the face. By combining images of the lower half of the face and the eye area, they recognize the facial expression of the entire face as a whole for transferring it to a virtual avatar.
 
 
The team from the Georgia Institute of Technology, in partnership with Googlé (Hickson, Dufour, Sud, Kwatra, & Essa, 2017) proposed a more elegant solution and developed a technology for recognizing a person's emotions in virtual reality glasses without adding any additional hardware devices to the frame. This algorithm is based on a convolutional neural network and recognizes facial expressions ("anger," "joy," "surprise," "neutral facial expression," "closed eyes") on average 74% of the time. For its training, developers used their own eye image data from the infrared cameras of the built-in iTtracker recorded during the visualization of one or another emotion by participants (there were 23 in all) in virtual reality glasses. In addition, dataset included the emotional expression of the entire face of the participants in the study, recorded on a regular camera. As a method of detecting facial expression, they chose a well-known system based on the recognition of action units - FACS (although they took only the action units of the upper half of the face). Each action unit was recognized by the proposed algorithm in 63.7% of cases without personalization, and in 70.2% - with personalization.
 
 
Similar developments in the sphere of the detection of emotions only on images of the eye area are without binding to virtual reality (Priya & Muralidhar, 2017; Vinotha, Arun, & Arun, 2013). For example, (Priya & Muralidhar, 2017) also constructed a system based on action units and derived an algorithm for recognizing seven emotional facial expressions (joy, sadness, anger, fear, disgust, surprise and neutral facial expression) seven points (three along the eyebrow, two in the corners of the eye and two more in the center of the upper and lower eyelid), which works with an accuracy of 78.2%.
 
 
Global research and experiments continue, and working group Neurodata Lab on tracking will tell you about them. Stay with us.
 
 
Bibliography:
 
 
Hickson, S., Dufour, N., Sud, A., Kwatra, V., & Essa, I. (2017). Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. Retrieved from arxiv.org/abs/???r3r3156.
 
Huckauf, A., Goettel, T., Heinbockel, M., & Urbina, M. (2005). What you do not look at is what you get: anti-saccades can reduce the midas touch-problem. In Proceedings of the 2nd Symposium on Applied Perception in Graphics and Visualization, APGV 200? A Coruña, Spain, August 26-2? 2005 (pp. 170-170).
doi.org/???/???r3r3156.
 
Pfeuffer, K., Mayer, B., Mardanbegi, D., & Gellersen, H. (2017). Gaze + pinch interaction in virtual reality. Proceedings of the 5th Symposium on Spatial User Interaction - SUI '1? (October), 99-108.
doi.org/???/???r3r3156.
 
Pi, J., & Shi, B.E. (2017). Probabilistic adjustment of dwell time for eye typing. Proceedings - ???th International Conference on Human System Interactions, HSI 201? 251-257.
doi.org/???/HSI.???r3r3156.
 
Priya, V. R., & Muralidhar, A. (2017). Facial Emotion Recognition Using Eye. International Journal of Applied Engineering Research ISSN, 12 (September), 5655-5659.
doi.org/???/SMC.???r3r3156.
 
Schenk, S., Dreiser, M., Rigoll, G., & Dorr, M. (2017). GazeEverywhere: Enabling Gaze-only User Interaction on an Unmodified Desktop PC in Everyday Scenarios. CHI '17 Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 3034-3044.
doi.org/???/???r3r3156.
 
Schenk, S., Tiefenbacher, P., Rigoll, G., & Dorr, M. (2016). Spock. Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '1? 2681-2687.
doi.org/???/???r3r3156.
 
Vinotha, S. R., Arun, R., & Arun, T. (2013). Emotion Recognition from Human Eye Expression. International Journal of Research in Computer and Communication Technology, 2 (4), 158-164.
 
Wu, T., Wang, P., Lin, Y., & Zhou, C. (2017). A Robust Noninvasive Eye Control Approach For Disabled People Based on Kinect 2.0 Sensor. IEEE Sensors Letters, 1 (4), 1-4.
doi.org/???/LSENS.???r3r3156.
 
 
Author of the material:
 
Maria Konstantinova, research associate at
Neurodata Lab
, biologist, physiologist, specialist in the visual sensory system, oculography and oculomotorics.
+ 0 -

Add comment