The path to the contactless lie detector, or How to arrange a hackathon at maximum speed
Once Steve Jobs and Steve Wozniak closed in the garage and rolled out the first Mac. It would be great if you could always close the programmers in the garage and get an MVP with great potential. However, if you add a couple of people to programmers who are ready to evaluate user experience and look for something innovative, the chances of success are growing.
Our team of 5 people had a definite idea, for which we decided to slightly r3r314. take over the world
FACS because this method has sufficient scientific validity compared to, for example, 3r375. so
rough approach. Accordingly, the task was split into
3r3208. Network training predicting 68 face marks 3r3209.
3r3208. Normalize /filter the face image 3r3209.
3r3208. Algorithm that detects facial movement in dynamics 3r3209.
The training, by the way, was done on the Radeon RX580 with the help of PlaidML, about which I already told in my previous article . Thanks a lot here you need to tell the library imgaug , which allows you to apply affine transformations simultaneously to images and points on it (in our case, to landmarks).
Few augmented images:
To determine the direction of the gaze, an algorithm was initially used on classical computer vision, searching for a pupil in the eye area on the HOGs. But soon the understanding came that often the pupil is not visible and the direction of gaze can be described not only by him, but also by the location of the eyelids. Due to these difficulties, the solution was transferred to the neural network approach. We cut and dated Dataset ourselves, driving through the first algorithm, and then manually corrected the places of its mistakes.
The first developments began in the summer and existed in a dirty Python script:
The determination of the blinking frequency resulted from the aggregation of the parts of the two algorithms described above - bringing the Landmarks of the eyelids closer and looking downward.
The determination of the pulse from the video stream was based on the idea of absorption of the green component of light by blood particles and added algorithms for tracking and extracting areas of interest (skin).
Eerie mask goes, of course:
In fact, the creation of the above-described bricks has been reduced to the implementation of the State Of The Art algorithms with modifications to improve the accuracy in our particular case. Fortunately, there is arxiv.org .
Difficulties have appeared in creating the logic of normalizing the face image and the algorithms for evaluating the obtained data. For example, when recognizing faces, 3r3-350 are actively used. Active Appearance Model
- face on the points found stretches on the overall texture of the face. But it is important to us the relative position of the points! As an option - to filter too turned faces, or to stretch onto the texture only by “anchors”, key points that do not reflect muscle movement (for example, a point on the nose bridge and the edge of the face). This problem is now one of the main and does not allow to obtain reliable data if the face is too rotated (we can count the angle of rotation too!). The permissible framework for today is + -20 ° along both axes. Otherwise, the face is simply not processed.
Of course, there are other problems:
3r3208. The definition of landmarks, if a person is wearing glasses
3r3208. Extracting the baseline if the person is grimacing
3r3208. Pulse detection with strongly flickering illumination
Oh yes, and what is 3r3r174. "Baseline behavior"
? The fundamental concept in the processing of emotions by FACS methods. The baseline extraction algorithm is probably one of the most important know-how based on our hackathon.
In addition to algorithms, there is another important point that we could not forget about - performance. And the performance ceiling is not even a PC, but a regular laptop. As a result, all the algorithms are as easy as possible, and the networks undergo an iterative reduction in size while maintaining acceptable accuracy.
The result is 30-40% on the Intel i5 at 15-20 fps. It is clear that there is a certain margin that will disappear as additional modules are added.
Plans to determine:
3r3208. Drying in the throat
3r3208. Discoloration of the face 3r3209.
3r3208. Respiratory rate
3r3208. Intensity of body movements
3r3208. Patterns of human posture
3r3208. Trembling in the voice 3r3209.
What else can we do?
I, as a fan of computer vision and ML, told you a little about the algorithms used in our software. But due to its incompleteness, for this application, the above possibilities are rather a pleasant addition. The most important part is the developed system for determining the human psychotype. What is the point? Unfortunately, my colleagues (friends!) Were engaged in this and I would not be able to explain what was coming from. But for a minimal understanding, you can consider the order of working with the received software:
HR sets qualities that are particularly needed for the job being considered: 3r33286.
HR conducts an interview, asking some of the questions of the prepared database (during the interview HR has additional information about the emotions and level of stress) 3r-3686.
During or after the interview, HR fills in answers to questions and behavioral patterns shown: 3r-3286.
Through the developed matrixes, the software builds an infographic that reflects the coincidence of the given and defined qualities: 3r33286.
After the interview, there is a record that allows you to return to the interview at any time and evaluate one or another moment 3r-3286.
14 days x 12 hours + 3 developers + 2 specialists in the field of lie determination = ready MVP. The dive was maximum - up to the fact that at lunchtime we watched the series Lie to me - I strongly advise.
Not to be unfounded, I attach example 3r38080. how the application works now:
promo video big decision "Anne", to which we step.