• Guest
HabraHabr
  • Main
  • Users

  • Development
    • Programming
    • Information Security
    • Website development
    • JavaScript
    • Game development
    • Open source
    • Developed for Android
    • Machine learning
    • Abnormal programming
    • Java
    • Python
    • Development of mobile applications
    • Analysis and design of systems
    • .NET
    • Mathematics
    • Algorithms
    • C#
    • System Programming
    • C++
    • C
    • Go
    • PHP
    • Reverse engineering
    • Assembler
    • Development under Linux
    • Big Data
    • Rust
    • Cryptography
    • Entertaining problems
    • Testing of IT systems
    • Testing Web Services
    • HTML
    • Programming microcontrollers
    • API
    • High performance
    • Developed for iOS
    • CSS
    • Industrial Programming
    • Development under Windows
    • Image processing
    • Compilers
    • FPGA
    • Professional literature
    • OpenStreetMap
    • Google Chrome
    • Data Mining
    • PostgreSQL
    • Development of robotics
    • Visualization of data
    • Angular
    • ReactJS
    • Search technologies
    • Debugging
    • Test mobile applications
    • Browsers
    • Designing and refactoring
    • IT Standards
    • Solidity
    • Node.JS
    • Git
    • LaTeX
    • SQL
    • Haskell
    • Unreal Engine
    • Unity3D
    • Development for the Internet of things
    • Functional Programming
    • Amazon Web Services
    • Google Cloud Platform
    • Development under AR and VR
    • Assembly systems
    • Version control systems
    • Kotlin
    • R
    • CAD/CAM
    • Customer Optimization
    • Development of communication systems
    • Microsoft Azure
    • Perfect code
    • Atlassian
    • Visual Studio
    • NoSQL
    • Yii
    • Mono и Moonlight
    • Parallel Programming
    • Asterisk
    • Yandex API
    • WordPress
    • Sports programming
    • Lua
    • Microsoft SQL Server
    • Payment systems
    • TypeScript
    • Scala
    • Google API
    • Development of data transmission systems
    • XML
    • Regular expressions
    • Development under Tizen
    • Swift
    • MySQL
    • Geoinformation services
    • Global Positioning Systems
    • Qt
    • Dart
    • Django
    • Development for Office 365
    • Erlang/OTP
    • GPGPU
    • Eclipse
    • Maps API
    • Testing games
    • Browser Extensions
    • 1C-Bitrix
    • Development under e-commerce
    • Xamarin
    • Xcode
    • Development under Windows Phone
    • Semantics
    • CMS
    • VueJS
    • GitHub
    • Open data
    • Sphinx
    • Ruby on Rails
    • Ruby
    • Symfony
    • Drupal
    • Messaging Systems
    • CTF
    • SaaS / S+S
    • SharePoint
    • jQuery
    • Puppet
    • Firefox
    • Elm
    • MODX
    • Billing systems
    • Graphical shells
    • Kodobred
    • MongoDB
    • SCADA
    • Hadoop
    • Gradle
    • Clojure
    • F#
    • CoffeeScript
    • Matlab
    • Phalcon
    • Development under Sailfish OS
    • Magento
    • Elixir/Phoenix
    • Microsoft Edge
    • Layout of letters
    • Development for OS X
    • Forth
    • Smalltalk
    • Julia
    • Laravel
    • WebGL
    • Meteor.JS
    • Firebird/Interbase
    • SQLite
    • D
    • Mesh-networks
    • I2P
    • Derby.js
    • Emacs
    • Development under Bada
    • Mercurial
    • UML Design
    • Objective C
    • Fortran
    • Cocoa
    • Cobol
    • Apache Flex
    • Action Script
    • Joomla
    • IIS
    • Twitter API
    • Vkontakte API
    • Facebook API
    • Microsoft Access
    • PDF
    • Prolog
    • GTK+
    • LabVIEW
    • Brainfuck
    • Cubrid
    • Canvas
    • Doctrine ORM
    • Google App Engine
    • Twisted
    • XSLT
    • TDD
    • Small Basic
    • Kohana
    • Development for Java ME
    • LiveStreet
    • MooTools
    • Adobe Flash
    • GreaseMonkey
    • INFOLUST
    • Groovy & Grails
    • Lisp
    • Delphi
    • Zend Framework
    • ExtJS / Sencha Library
    • Internet Explorer
    • CodeIgniter
    • Silverlight
    • Google Web Toolkit
    • CakePHP
    • Safari
    • Opera
    • Microformats
    • Ajax
    • VIM
  • Administration
    • System administration
    • IT Infrastructure
    • *nix
    • Network technologies
    • DevOps
    • Server Administration
    • Cloud computing
    • Configuring Linux
    • Wireless technologies
    • Virtualization
    • Hosting
    • Data storage
    • Decentralized networks
    • Database Administration
    • Data Warehousing
    • Communication standards
    • PowerShell
    • Backup
    • Cisco
    • Nginx
    • Antivirus protection
    • DNS
    • Server Optimization
    • Data recovery
    • Apache
    • Spam and antispam
    • Data Compression
    • SAN
    • IPv6
    • Fidonet
    • IPTV
    • Shells
    • Administering domain names
  • Design
    • Interfaces
    • Web design
    • Working with sound
    • Usability
    • Graphic design
    • Design Games
    • Mobile App Design
    • Working with 3D-graphics
    • Typography
    • Working with video
    • Work with vector graphics
    • Accessibility
    • Prototyping
    • CGI (graphics)
    • Computer Animation
    • Working with icons
  • Control
    • Careers in the IT industry
    • Project management
    • Development Management
    • Personnel Management
    • Product Management
    • Start-up development
    • Managing the community
    • Service Desk
    • GTD
    • IT Terminology
    • Agile
    • Business Models
    • Legislation and IT-business
    • Sales management
    • CRM-systems
    • Product localization
    • ECM / EDS
    • Freelance
    • Venture investments
    • ERP-systems
    • Help Desk Software
    • Media management
    • Patenting
    • E-commerce management
    • Creative Commons
  • Marketing
    • Conferences
    • Promotion of games
    • Internet Marketing
    • Search Engine Optimization
    • Web Analytics
    • Monetize Web services
    • Content marketing
    • Monetization of IT systems
    • Monetize mobile apps
    • Mobile App Analytics
    • Growth Hacking
    • Branding
    • Monetize Games
    • Display ads
    • Contextual advertising
    • Increase Conversion Rate
  • Sundry
    • Reading room
    • Educational process in IT
    • Research and forecasts in IT
    • Finance in IT
    • Hakatonas
    • IT emigration
    • Education abroad
    • Lumber room
    • I'm on my way

We play Mortal Kombat with TensorFlow.js

 3r31259. 3r3-31. Experimenting with improvements for the forecasting model. Guess.js , I began to look closely at deep learning: to recurrent neural networks (RNN), in particular, LSTM because of their “Unreasonable effectiveness” 3r31240. in the area where Guess.js works. At the same time, I started playing with convolutional neural networks (CNN), which are also often used for time series. CNN is commonly used for the classification, recognition and detection of images. 3r31247.  3r31259. 3r31247.  3r31259. We play Mortal Kombat with TensorFlow.js MK.js using TensorFlow.js
3r33989. 3r31247.  3r31259. 3r31247.  3r31259.
The source code for of this article 3r312340. and 3r31239. Mk.js I have on GitHub . I have not laid out the training data set, but you can build your own and train the model as described below!
3r3331. 3r31240. 3r31247.  3r31259. Having played with CNN, I remembered 3r3335. experiment 3r31240. that spent a few years ago, when browser developers released getUserMedia API. In it, the user's camera served as a controller for playing a small jаvascript-clone of Mortal Kombat 3. You can find that game in 3r31239. GitHub repositories. . As part of the experiment, I implemented a basic positioning algorithm that classifies an image into the following classes: 3r31247.  3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. Punch with left or right hand  3r31259. 3r31194. Kick with left or right foot  3r31259. 3r31194. Steps left and right  3r31259. 3r31194. Squatting  3r31259. 3r31194. None of the above  3r31259. 3r31197. 3r31247.  3r31259. The algorithm is so simple that I can explain it in a few sentences: 3r3-31247.  3r31259. 3r31247.  3r31259.

The algorithm photographs the background. As soon as the user appears in the frame, the algorithm calculates the difference between the background and the current frame with the user. So he determines the position of the user's figure. The next step is to display the user's body in white on black. After that, vertical and horizontal histograms are constructed summarizing the values ​​for each pixel. Based on this calculation, the algorithm determines the current body position.
3r31247.  3r31259. The video shows how the program works. Source code on GitHub . 3r31247.  3r31259. 3r31247.  3r31259.
3r380.
3r31255. 3r31255. 3r31255. 3r31247.  3r31259. Although the tiny MK clone worked successfully, the algorithm is far from perfect. Requires a frame with a background. For proper operation, the background must be the same color throughout the program. Such a restriction means that changes in light, shadows and other things will interfere and give an inaccurate result. Finally, the algorithm does not recognize actions; it classifies the new frame only as a body position from a predetermined set. 3r31247.  3r31259. 3r31247.  3r31259. Now, thanks to the progress in the web API, namely WebGL, I decided to return to this task by applying TensorFlow.js. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Introduction of 3r31224. 3r31247.  3r31259. In this article, I will share the experience of creating a body position classification algorithm using TensorFlow.js and MobileNet. Consider the following topics: 3r31247.  3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. Collection of training data for image classification 3r31195.  3r31259. 3r31194. Data augmentation with 3r31227. imgaug 3r31195.  3r31259. 3r31194. Transfer training with MobileNet  3r31259. 3r31194. Binary classification and N-tric classification  3r31259. 3r31194. Teaching the model of image classification TensorFlow.js in Node.js and using it in the browser 3r31195.  3r31259. 3r31194. A few words about the classification of actions with LSTM 3r31195.  3r31259. 3r31197. 3r31247.  3r31259. In this article, we will reduce the problem to determining the position of the body on the basis of one frame, as opposed to recognizing an action by a sequence of frames. We will develop a model of deep learning with a teacher, which, based on the image from the user's webcam, determines the movements of a person: a punch, kick or none of this. 3r31247.  3r31259. 3r31247.  3r31259. By the end of the article we will be able to build a model for the game in 3r31239. Mk.js : 3r31247.  3r31259. 3r31247.  3r31259. 3r33140. 3r31247.  3r31259. 3r31247.  3r31259. For a better understanding of the article, the reader should be familiar with the basic concepts of programming and jаvascript. A basic understanding of deep learning is also helpful, but not necessary. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Data collection 3r31224. 3r31247.  3r31259. The accuracy of the deep learning model largely depends on the quality of the data. It is necessary to strive to collect an extensive set of data, as in production. 3r31247.  3r31259. 3r31247.  3r31259. Our model should be able to recognize punches and kicks. This means that we must collect images of three categories:
 3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. Hand punches  3r31259. 3r31194. Foot kicks  3r31259. 3r31194. The other  3r31259. 3r31197. 3r31247.  3r31259. In this experiment, two volunteers helped me to collect photos ( @Lili_vs And @Gsamokovarov ). We recorded 5 QuickTime videos on my MacBook Pro, each containing 2-4 punches and 2-4 punches. 3r31247.  3r31259. 3r31247.  3r31259. Then we use ffmpeg to extract individual frames from videos and save them as images 3r31217. jpg 3r31218. : 3r31247.  3r31259. 3r31247.  3r31259. 3r31217. ffmpeg -i video.mov $ filename% 03d.jpg 3r31247.  3r31259. 3r31247.  3r31259. To run the above command, you first need install on computer 3r31217. ffmpeg . 3r31247.  3r31259. 3r31247.  3r31259. If we want to train a model, we must provide input data and corresponding output data, but at this stage we only have a bunch of images of three people in different poses. To structure the data, you need to classify frames in three categories: punches, kicks, and others. For each category, a separate directory is created, where all relevant images are moved. 3r31247.  3r31259. 3r31247.  3r31259. Thus, there should be about 200 images in each directory, similar to those shown below:
 3r31259. 3r31247.  3r31259. 1.jpg 3r31218. The second is 3r31217. 2.jpg 3r31218. etc. 3r31247.  3r31259. 3r31247.  3r31259. If we train the model only on 600 photographs taken in the same environment with the same people, we will not achieve a very high level of accuracy. To get the most out of our data, it’s better to generate a few extra samples using data augmentation. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Data augmentation 3r31224. 3r31247.  3r31259. Data augmentation is a technique that increases the number of data points by synthesizing new points from an existing set. Usually augmentation is used to increase the size and variety of the training set. We transfer the original images to the transformation pipeline that creates new images. You can not be too aggressive approach to transformation: from a punch should be generated only other punches. 3r31247.  3r31259. 3r31247.  3r31259. Acceptable transformations are rotation, color inversion, blur, etc. There are excellent open source tools for data augmentation. At the time of writing the article on jаvascript there were not too many options, so I used the library implemented in Python - 3r31227. imgaug . It has a set of augmenters that can be applied probabilistically. 3r31247.  3r31259. 3r31247.  3r31259. Here is the data augmentation logic for this experiment:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. np.random.seed (44)
ia.seed (44)
3r31259. def main ():
for i in range (? 191): 3r31259. draw_single_sequential_images (str (i), "others", "others-aug")
for i in range (? 191): 3r31259. draw_single_sequential_images (str (i), "hits", "hits-aug")
for i in range (? 191): 3r31259. draw_single_sequential_images (str (i), "kicks", "kicks-aug")
3r31259. def draw_single_sequential_images (filename, path, aug_path):
image = misc.imresize (ndimage.imread (path + "/" + filename + ".jpg"), (5? 100))
sometimes = lambda aug: iaa.Sometimes (0.? aug)
seq = iaa.Sequential (
[
iaa.Fliplr(0.5), # horizontally flip 50% of all images
# crop images by -5% to 10% of their height/width
sometimes(iaa.CropAndPad(
percent=(-0.05, 0.1),
pad_mode=ia.ALL,
pad_cval=(0, 255)
)),
sometimes(iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)}, # scale images to 80-120% of their size, individually per axis
translate_percent={"x": (-0.1, 0.1), "y": (-0.1, 0.1)}, # translate by -10 to +10 percent (per axis)
rotate=(-5, 5),
shear=(-5, 5), # shear by -5 to +5 degrees
order=[0, 1], # use nearest neighbor or bilinear interpolation (fast)
cval = (? 255), # if mode is constant, use a cval between 0 and 255
mode = ia. ALL # use any of the scikit-image's warping modes (see image from the top for examples)
),
iaa.Grayscale (alpha = (0.? 1.0)),
iaa.Invert (0.0? per_channel = False), # invert color channels
# execute 0 to 5 of the following (less important) augmenters per image
#
iaa.SomeOf ((? 5),
[
iaa.OneOf([
iaa.GaussianBlur((0, 2.0)), # blur images with a sigma between 0 and 2.0
iaa.AverageBlur(k=(2, 5)), # blur image using local means with kernel sizes between 2 and 5
iaa.MedianBlur(k=(3, 5)), # blur image using local medians with kernel sizes between 3 and 5
]),
iaa.Sharpen (alpha = (? 1.0), lightness = (0.7? 1.5)), # sharpen images 3r31259. iaa.Emboss (alpha = (? 1.0), strength = (? 2.0)), # emboss images
iaa.AdditiveGaussianNoise (loc = ? scale = (0.? ??? * 255), per_channel = 0.5), # add noise to images
iaa.Add ((- 1? 10), per_channel = 0.5), # change brightness of images (by -10 to 10 of original value)
iaa.AddToHueAndSaturation ((- 2? 20)), # change hue and saturation
#
#
iaa.OneOf ([
iaa.Multiply((0.9, 1.1), per_channel=0.5),
iaa.FrequencyNoiseAlpha(
exponent=(-2, 0),
first=iaa.Multiply((0.9, 1.1), per_channel=True),
second=iaa.ContrastNormalization((0.9, 1.1))
)
]), 3r31259. iaa.ContrastNormalization ((0.? 2.0), per_channel = 0.5), # improve or worsen the contrast
], 3r31259. random_order = True
) 3r31259. ], 3r31259. random_order = True
) 3r31259. 3r31259. im = np.zeros ((1? 5? 10? 3), dtype = np.uint8)
for c in range (? 16): 3r31259. im[c]= image
3r31259. for im in range (len (grid)):
misc.imsave (aug_path + "/" + filename + "_" + str (im) + ".jpg", grid[im]) 3r31218. 3r31115. 3r31247.  3r31259. This script uses the method. main with three cycles for - one for each category of images. In each iteration, in each of the cycles, we call the method. draw_single_sequential_images : the first argument is the file name, the second is the path, the third is the directory where to save the result. 3r31247.  3r31259. 3r31247.  3r31259. After that, we read the image from the disk and apply a number of transformations to it. I have documented most of the transformations in the above code snippet, so we will not repeat. 3r31247.  3r31259. 3r31247.  3r31259. For each image creates 16 other images. Here is an example of what they look like:
 3r31259. 3r31247.  3r31259. 3r33350. 3r31240. 3r31247.  3r31259. 3r31247.  3r31259. Please note that in the above script, we scale the images to 100x56 3r31218. pixels We do this to reduce the amount of data and, accordingly, the number of calculations that our model performs during training and evaluation. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Building a model 3r31224. 3r31247.  3r31259. Now we will build a model for classification! 3r31247.  3r31259. 3r31247.  3r31259. Since we are dealing with images, we use the convolutional neural network (CNN). This network architecture is known to be suitable for image recognition, object detection and classification. 3r31247.  3r31259. 3r31247.  3r31259.

Transfer learning 3r3-31205. 3r31247.  3r31259. The image below shows the popular CNN VGG-16 used to classify images. 3r31247.  3r31259. 3r31247.  3r31259. 3r33382. 3r31247.  3r31259. 3r31247.  3r31259. The neural VGG-16 recognizes 1000 image classes. It has 16 layers (not counting the pooling layers and the output). This multi-layer network is difficult to train in practice. This will require a large data set and many hours of training. 3r31247.  3r31259. 3r31247.  3r31259. The hidden layers of the trained CNN recognize the various elements of the images from the training set, starting at the edges, moving on to more complex elements, such as figures, individual objects, and so on. A trained CNN-style VGG-16 for recognizing a large set of images should have hidden layers that have learned many features from the training set. Such signs will be common to most images and, accordingly, be reused for different tasks. 3r31247.  3r31259. 3r31247.  3r31259. Transfer training allows you touse existing and trained network. We can take the output from any of the layers of the existing network and transmit it as input to the new neural network. Thus, by teaching a newly created neural network, over time, it can be taught to recognize new features of a higher level and correctly classify images from classes that the original model has never seen before. 3r31247.  3r31259. 3r31247.  3r31259. 3r33399. 3r33400. 3r31255. 3r31247.  3r31259. 3r31247.  3r31259. For our purposes, let's take the MobileNet neural network from the package. @ tensorflow-models /mobilenet . MobileNet is as powerful as VGG-1? but it is much smaller, which speeds up forward propagation, that is, network activation (forward propagation), and reduces download time in the browser. MobileNet trained on a dataset for image classification 3r3408. ILSVRC-2012-CLS . 3r31247.  3r31259. 3r31247.  3r31259. When developing a model with the transfer of training, we have choices for two points: 3r31247.  3r31259. 3r31247.  3r31259. 3r33418.  3r31259. 3r31194. The output from which layer of the source model to use as input for the target model. 3r31195.  3r31259. 3r31194. How many layers of the target model are we going to train, if any. 3r31195.  3r31259. 3r33426. 3r31247.  3r31259. The first moment is very significant. Depending on the selected layer, we will get the signs at a lower or higher level of abstraction as input for our neural network. 3r31247.  3r31259. 3r31247.  3r31259. We are not going to train any layers of MobileNet. Select the output from 3r31217. global_average_pooling2d_1 and pass them on as input to our tiny model. Why did I choose this particular layer? Empirically. I did some tests, and this layer works quite well. 3r31247.  3r31259. 3r31247.  3r31259.

The definition of the model

3r31247.  3r31259. The initial task was to classify the image into three classes: hand, foot and other movements. Let's first solve the problem of a smaller one: determine whether there is a hand strike in the frame or not. This is a typical binary classification task. For this purpose, we can define the following model:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. import * as tf from '@ tensorflow /tfjs'; 3r31259. 3r31259. const model = tf.sequential (); 3r31259. model.add (tf.layers.inputLayer ({inputShape:[1024]})); 3r31259. model.add (tf.layers.dense ({units: 102? activation: 'relu'})); 3r31259. model.add (tf.layers.dense ({units: ? activation: 'sigmoid'})); 3r31259. model.compile ({
optimizer: tf.train.adam (1e-6),
loss: tf.losses.sigmoidCrossEntropy,
metrics:['accuracy']
}); 3r31218. 3r31115. 3r31247.  3r31259. This code defines a simple model, the 3r31217 layer. 1024 3r31218. units and activation ReLU , and also one output unit, which passes through the activation function sigmoid . The latter gives the number from 3r31217. 0 up to 3r31217. 1 3r31218. , depending on the probability of the presence of a hand strike in a given frame. 3r31247.  3r31259. 3r31247.  3r31259. Why I chose 1024 3r31218. unit for the second level and learning rate 3r3r1217. 1e-6 3r31218. ? Well, I tried several different options and saw that such parameters work best. The “spear method” does not seem to be the best approach, but to a large extent, this is how setting up hyper parameters in deep learning works - based on our understanding of the model, we use intuition to update orthogonal parameters and empirically check how the model works. 3r31247.  3r31259. 3r31247.  3r31259. Method compile compiles the layers together, preparing a model for learning and evaluation. Here we announce that we want to use the optimization algorithm 3r31217. adam 3r31218. . We also announce that we will calculate the loss (loss) from the cross entropy, and indicate that we want to evaluate the accuracy of the model. Then TensorFlow.js calculates the accuracy by the formula: 3r31224.  3r31259. 3r31247.  3r31259. 3r31217. Accuracy = (True Positives + True Negatives) /(Positives + Negatives) 3r31247.  3r31259. 3r31247.  3r31259. If you transfer training from the original MobileNet model, you first need to download it. Since it is impractical to train our model on more than 3000 images in the browser, we will apply Node.js and load the neural network from the file. 3r31247.  3r31259. 3r31247.  3r31259. You can download MobileNet here is . The directory is a file model.json which contains the model architecture - layers, activations, etc. The remaining files contain the parameters of the model. You can load a model from a file using this code:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. export const loadModel = async () => {
const mn = new mobilenet.MobileNet (? 1); 3r31259. mn.path = `file: //PATH /TO /model.json`; 3r31259. await mn.load (); 3r31259. return (input): tf.Tensor1D =>
mn.infer (input, 'global_average_pooling2d_1')
.reshape ([1024]); 3r31259.}; 3r31218. 3r31115. 3r31247.  3r31259. Please note that in method loadModel we return a function that takes a one-dimensional tensor as input and returns 3r3-31217. mn.infer (input, Layer) . Method infer takes a tensor and a layer as arguments. The layer determines which hidden layer we want to get output from. If you open model.json and look for 3r31217. global_average_pooling2d_1 , then you will find such a name in one of the layers. 3r31247.  3r31259. 3r31247.  3r31259. Now you need to create a data set for learning the model. To do this, we must skip all the images through the method. infer in MobileNet and assign them tags: 3r31217. 1 3r31218. for images with bumps and 0 for images without impact:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. const punches = require ('fs')
.readdirSync (Punches)
.filter (f => f.endsWith ('. jpg'))
.map (f => `$ {Punches} /$ {f}`); 3r31259. 3r31259. const others = require ('fs')
.readdirSync (Others)
.filter (f => f.endsWith ('. jpg'))
.map (f => `$ {Others} /$ {f}`); 3r31259. 3r31259. const ys = tf.tensor1d (
new Array (punches.length) .fill (1)
.cat (new Array (others.length) .fill (0))); 3r31259. 3r31259. const xs: tf.Tensor2D = tf.stack (
punches
.map ((path: string) => mobileNet (readInput (path)))
.cat (others.map ((path: string) => mobileNet (readInput (path))))
) as tf.Tensor2D; 3r31218. 3r31115. 3r31247.  3r31259. In the code above, we first read the files in the directories with and without hitting. Then we determine the one-dimensional tensor containing the output labels. If we have 3r31217. n images with hits and m 3r31218. other images in the tensor will be n items with a value of 1 and m 3r31218. elements with a value of 0.
 3r31259. 3r31247.  3r31259. In 3r31217. xs
we add the results of calling infer for individual images. Notice that for each image we call the method. readInput . Here is its implementation:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. export const readInput = img => imageToInput (readImage (img), TotalChannels); 3r31259. 3r31259. const readImage = path => jpeg.decode (fs.readFileSync (path), true); 3r31259. 3r31259. const imageToInput = image => {
const values ​​= serializeImage (image); 3r31259. return tf.tensor3d (values,[image.height, image.width, 3], 'int32'); 3r31259.}; 3r31259. 3r31259. const serializeImage = image => {
const totalPixels = image.width * image.height; 3r31259. const result = new Int32Array (totalPixels * 3); 3r31259. for (let i = 0; i < totalPixels; i++) {
results result;
};

 3r31259. 3r31217. readInput first calls readImage functions, and then delegates its call to r3r31217. imageToInput . Function readImage reads the image from the disk and then decodes the jpg from the buffer using the package. jpeg-js . In 3r31217. imageToInput we transform the image into a three-dimensional tensor. 3r31247.  3r31259. 3r31247.  3r31259. As a result, for each i 3r31218. from 3r31217. 0 up to 3r31217. TotalImages should be ys3r31218. equals 1 3r31218. if xs[i]3r31218. corresponds to the image with a stroke, and 0 otherwise. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Training model 3r31247.  3r31259. Now the model is ready to learn! Call the method fit : 3r31247.  3r31259. 3r31247.  3r31259. 3r31101. 3r31031. await model.fit (xs, ys, {3r31259. epochs: Epochs,
batchSize: parseInt (((punches.length + others.length) * BatchSize) .toFixed (0)),
callbacks: 3r3r3r3ed3), to_fixed (0)),
callbacks: {3r3r3r3r3ed3) : async (_, logs) => {3r31259. console.log ('Cost:% s, accuracy:% s', logs.loss.toFixed (5), logs.acc.toFixed (5));
await tf.nextFrame (); 3r31259.}
}
}); 3r31218. 3r31115. 3r31247.  3r31259. The above code calls fit with three arguments: xs , ys and configuration object. In the configuration object, we set how many epochs the model will learn, the packet size, and the callback that TensorFlow.js will generate after processing each packet. 3r31247.  3r31259. 3r31247.  3r31259. The size of the package determines xs and 3r31217. ys
for learning models in one era. For each epoch, TensorFlow.js will select a subset of 3r3-31217. xs
and the corresponding elements of 3r31217. ys , performs a direct distribution, gets the output of the layer with the activation of 3r31217. sigmoid , and then, based on the loss, performs optimization using the algorithm 3r31217. adam 3r31218. . 3r31247.  3r31259. 3r31247.  3r31259. After launching the training script, you will see a result similar to the one below: 3r31247.  3r31259. 3r31247.  3r31259. 3r31101. Cost: ???? accuracy: ???r3r31259. eta = 0.3> ---------- acc = ??? loss = ??? Cost: ???? accuracy: ???r31259. eta = 0.2 => --------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 ==> -------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 ===> ------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 ====> ------ acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 =====> ----- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 ======> ---- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 =======> --- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 ========> - acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.0 ==========>
293ms 14675us /step - acc = ??? loss = ???r3r31259. Epoch 3/50
Cost: ???? accuracy: ???r3r31259. eta = 0.3> ---------- acc = ??? loss = ??? Cost: ???? accuracy: ???r31259. eta = 0.3 => --------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.3 ==> -------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 ===> ------- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 ====> ------ acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.2 =====> ----- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 ======> ---- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 =======> --- acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.1 ========> - acc = ??? loss = ??? Cost: ???? accuracy: ???r3r31259. eta = 0.0 ==========>
304ms 15221us /step - acc = ??? loss = ???r3r31115. 3r31247.  3r31259. Notice how accuracy increases over time, and loss decreases. 3r31247.  3r31259. 3r31247.  3r31259. On my data set, the model after training showed an accuracy of 92%. Keep in mind that accuracy may not be very high due to a small set of training data. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Running the model in the browser 3r31224. 3r31247.  3r31259. In the previous section, wetrained in a binary classification model. Now run it in the browser and connect it to the game 3r31239. MK.js ! 3r31247.  3r31259. 3r31247.  3r31259. 3r31101. 3r? 31102. const video = document.getElementById ('cam'); 3r31259. const Layer = 'global_average_pooling2d_1'; 3r31259. const mobilenetInfer = m => (p): tf.Tensor => m.infer (p, Layer); 3r31259. const canvas = document.getElementById ('canvas'); 3r31259. const scale = document.getElementById ('crop'); 3r31259. 3r31259. const ImageSize = {
Width: 10?
Height: 56 3r31259.}; 3r31259. 3r31259. navigator.mediaDevices
.getUserMedia ({
video: true,
audio: false
})
.then (stream => {3r31259. video.srcObject = stream; 3r3p31259.}); 3r31218. 3r31115. 3r31247.  3r31259. In the code above, there are several declarations: 3r31247.  3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. 3r31217. video 3r31218. contains a link to HTML5 video 3r31218. on page 3r31195.  3r31259. 3r31194. 3r31217. Layer 3r31218. contains the name of the layer from MobileNet, from which we want to get the output and send it as input to our model 3r31195.  3r31259. 3r31194. 3r31217. mobilenetInfer - A function that accepts a MobileNet instance and returns another function. The return function takes the input data and returns the corresponding output from the specified MobileNet layer 3r31195.  3r31259. 3r31194. 3r31217. canvas points to the element 3r31217. HTML5 canvas which we will use to extract frames from the 3r31195 video.  3r31259. 3r31194. 3r31217. scale - another 3r31217. canvas which is used to scale individual frames 3r31195.  3r31259. 3r31197. 3r31247.  3r31259. After that we get the video stream from the user's camera and set it as the source for the element 3r31217. video 3r31218. . 3r31247.  3r31259. 3r31247.  3r31259. The next step is to implement a grayscale filter that takes 3r31217. canvas and converts its contents:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. const grayscale = (canvas: HTMLCanvasElement) => {
const imageData = canvas.getContext ('2d'). getImageData (? ? canvas.width, canvas.height); 3r31259. const data = imageData.data; 3r31259. for (let i = 0; i < data.length; i += 4) {
const avg = (data[i]+ data[i + 1]+ data[i + 2]) /3;
data[i]= avg;
data w2w2212. = avg;
}
canvas.getContext ('2d'). putImageData (imageData, ? 0); 3r31218.}; 3r31218. 3r31115.
 3r31259. As a next step, link the model with MK.js:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. let mobilenet: (p: any) => tf.Tensor ; 3r31259. tf.loadModel ('http: //localhost: 5000 /model.json') .then (model => {
mobileNet
.load ()
.then ((mn: any) = mobilenet = mobilenetInfer ( mn)) 3r31259. .then (startInterval (mobilenet, model)); 3r31259.}); 3r31218. 3r31115. 3r31247.  3r31259. In the code above, we first load the model that was trained above, and then load MobileNet. Passing MobileNet to method 3r31217. mobilenetInfer to get a path to calculate the output from the hidden network layer. After that, call the method. startInterval with two networks as arguments. 3r31247.  3r31259. 3r31247.  3r31259. 3r31101. 3r31031. const startInterval = (mobilenet, model) => () => {
setInterval (() => {3r31259. canvas.getContext ('2d'). drawImage (video, ? 0); 3r?13?159. 3r?15?159. grayscale (scale
.getContext ('2d')
.dr3. canvas, ? ? canvas.width,
canvas.width /(ImageSize.Width /ImageSize.He2),
? ? ImageSize.Width, ImageSize.Height . = Array.from ((
Model.predict (mobilenet (tf.fromPixels (scale))) as tf.Tensor1D)
.DataSync () as Float32Array);
.Detect; 3r31259. If (punching> = 0.4) detect && detect.onPunch (); 3r31259. 3r31259.}, 100); 3r31259.}; 3r31218. 3r31115. 3r31247.  3r31259. The most interesting begins in method startInterval ! First, we run the interval, where every 100ms call an anonymous function. It is first over 3r31217. canvas Renders video with current frame. Then we reduce the frame size to 100x56 3r31218. and apply a gray shade filter to it. 3r31247.  3r31259. 3r31247.  3r31259. The next step is to transfer the frame to MobileNet, get the output from the desired hidden layer and transfer it as input to the method. predict 3r3-31218. our model. That returns a tensor with one element. Using dataSync we get the value from the tensor and assign it to the constant punching 3r31218. . 3r31247.  3r31259. 3r31247.  3r31259. Finally, check: if the probability of hitting a hand exceeds ???r31218. then call the method. onPunch global object 3r31217. Detect 3r31218. . MK.js provides a global object with three methods: 3r31217. onKick , 3r31217. onPunch and 3r31217. onStand that we can use to control one of the characters. 3r31247.  3r31259. 3r31247.  3r31259. Done! Here is the result! 3r31247.  3r31259. 3r31247.  3r31259. 3r33937. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Recognition of kicks and kicks with the N-tric classification 3r31247.  3r31259. In the next section, we will make a smarter model: a neural network that recognizes punches, kicks, and other images. This time, let's start with the preparation of a training set: 3r31247.  3r31259. 3r31247.  3r31259. 3r31101. 3r31031. const punches = require ('fs')
.readdirSync (Punches)
.filter (f => f.endsWith ('. jpg'))
.map (f => `$ {Punches} /$ {f}`); 3r31259. 3r31259. const kicks = require ('fs')
.readdirSync (Kicks)
.filter (f => f.endsWith ('. jpg'))
.map (f => `$ {Kicks} /$ {f}`); 3r31259. 3r31259. const others = require ('fs')
.readdirSync (Others)
.filter (f => f.endsWith ('. jpg'))
.map (f => `$ {Others} /$ {f}`); 3r31259. 3r31259. const ys = tf.tensor2d (
new Array (punches.length)
.fill ([1, 0, 0]) 3r3r?125. (others.length) .fill ([0, 0, 1])),
[punches.length + kicks.length + others.length, 3]
); 3r31259. 3r31259. const xs: tf.Tensor2D = tf.stack (
punches
.map ((path: string) => mobileNet (readInput (path)))
.cat (kicks.map ((path: string) => mobileNet (readInput (path))))
.cat (others.map ((path: string) => mobileNet (readInput (path))))
) as tf.Tensor2D; 3r31218. 3r31115. 3r31247.  3r31259. As before, we first read the catalogs with images of punches by hand, foot, and other images. After that, unlike last time, we form the expected result in the form of a two-dimensional tensor, and not a one-dimensional one. If we have 3r3393988. n
hand strike pictures, m pictures with a kick and k other images, then in the tensor ys will be n elements with a value of 3r31217.[1, 0, 0]3r31218. , 3r31217. m 3r31218. elements with a value of 3r31217.[0, 1, 0]3r31218. and 3r31217. k 3r31218. elements with a value of 3r31217.[0, 0, 1]3r31218. . 3r31247.  3r31259. 3r31247.  3r31259. Vector from 3r31217. n
elements in which n - 1 elements with a value of 3r31217. 0 and one element with the value 1 3r31218. , we call the unitary vector (one-hot vector). 3r31247.  3r31259. 3r31247.  3r31259. After that we form the input tensor xs by folding the output of each image from MobileNet. 3r31247.  3r31259. 3r31247.  3r31259. There will have to update the definition of the model:
 3r31259. 3r31247.  3r31259. 3r31101. 3r31031. const model = tf.sequential (); 3r31259. model.add (tf.layers.inputLayer ({inputShape:[1024]})); 3r31259. model.add (tf.layers.dense ({units: 102? activation: 'relu'})); 3r31259. model.add (tf.layers.dense ({units: ? activation: 'softmax'})); 3r31259. await model.compile ({
optimizer: tf.train.adam (1e-6),
loss: tf.losses.sigmoidCrossEntropy,
metrics:['accuracy']
}); 3r31218. 3r31115. 3r31247.  3r31259. The only two differences from the previous model are: 3r31247.  3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. The number of units in the output layer 3r31195.  3r31259. 3r31194. Activations in the output layer 3r31195.  3r31259. 3r31197. 3r31247.  3r31259. There are three units in the output layer, because we have three different categories of images:
 3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. Hand strike  3r31259. 3r31194. Kick foot  3r31259. 3r31194. The other  3r31259. 3r31197. 3r31247.  3r31259. On these three units, an activation of r3r31217 is triggered. softmax 3r31218. which transforms their parameters to a three-value tensor. Why three units for the output layer? Each of the three values ​​for the three classes can be represented by two bits: 00 , 3r31217. 01
, 3r31217. 10 . The sum of the tensor values ​​created by softmax 3r31218. is equal to ? that is, we will never get 0? so we will not be able to classify images of one of the classes. 3r31247.  3r31259. 3r31247.  3r31259. After training the model for 500 3r31218. Era I achieved an accuracy of about 92%! This is not bad, but do not forget that the training was conducted on a small set of data. 3r31247.  3r31259. 3r31247.  3r31259. The next step is to launch the model in the browser! Since logic is very similar to launching a model for binary classification, take a look at the last step, where an action is chosen based on the model output:
 3r31259. 3r31247.  3r31259. 3r31101. 3r? 31102. const[punch, kick, nothing]= Array.from ((model.predict (
Mobilenet (tf.fromPixels (scaled)) r3r31259.) As tf.Tensor1D) .dataSync () as Float32Array); 3r31259. 3r31259. const detect = (window as any) .Detect; 3r31259. if (nothing> = 0.4) return; 3r31259. 3r31259. if (kick> punch && kick> = ???) {
detect.onKick (); 3r31259. return; 3r31259.}
if (punch> kick && punch> = ???) detect.onPunch (); 3r31218. 3r31115. 3r31247.  3r31259. First we call MobileNet with a reduced frame in shades of gray, then we transfer the result to our trained model. The model returns a one-dimensional tensor, which we convert to Float32Array from 3r31217. dataSync
. In the next step, use r3r31217. Array.from
to cast a typed array to a jаvascript array. Then we extract the probabilities that there is a punch, a kick or nothing on the frame. 3r31247.  3r31259. 3r31247.  3r31259. If the probability of the third result exceeds 3r31217. ???r31218. come back. Otherwise, if the probability of kicking above ???r31218. , send a kick command to MK.js. If the probability of hitting a hand above ???r31218. and higher the probability of kicking, then send the action of the punch. 3r31247.  3r31259. 3r31247.  3r31259. In general, that's all! The result is shown below:
 3r31259. 3r31247.  3r31259. 3r31142. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Action Recognition 3r31224. 3r31247.  3r31259. If you collect a large and diverse set of data about people who hit with arms and legs, then you can build a model that works perfectly on individual frames. But is that enough? What if we want to go even further and distinguish two different types of kicks: with a turn and from the back (back kick). 3r31247.  3r31259. 3r31247.  3r31259. As can be seen in the frames below, at a certain point in time from a certain angle, both blows look the same:
 3r31259. 3r31247.  3r31259. 3r31159. 3r31247.  3r31259. 3r31247.  3r31259. 3r3r1164. 3r31247.  3r31259. 3r31247.  3r31259. But if you look at the performance, the movements are completely different: 3r31247.  3r31259. 3r31247.  3r31259. 3r31173. 3r31247.  3r31259. 3r31247.  3r31259. How to train the neural network to analyze the sequence of frames, and not one frame? 3r31247.  3r31259. 3r31247.  3r31259. For this purpose, we can explore another class of neural networks, called recurrent neural networks (RNN). For example, RNN is great for working with time series: 3r3-31247.  3r31259. 3r31247.  3r31259. 3r31186.  3r31259. 3r31194. Natural language processing (NLP), where each word depends on the preceding and following r3r31195.  3r31259. 3r31194. Predict the next page based on the browsing history of  3r31259. 3r31194. Recognizing the action in a sequence of frames  3r31259. 3r31197. 3r31247.  3r31259. Implementing such a model is beyond the scope of this article, but let's look at an example of architecture to get an idea of ​​how all this will work together. 3r31247.  3r31259. 3r31247.  3r31259.

The power of the RNN

3r31247.  3r31259. The diagram below shows the action recognition model:
 3r31259. 3r31247.  3r31259. 3r31212. 3r31247.  3r31259. 3r31247.  3r31259. Take the last 3r31217. n
frames from the video and transfer them to CNN. The CNN output for each frame is passed as input to the RNN. The recurrent neural network will determine the dependencies between the individual frames and recognize which action they correspond to. 3r31247.  3r31259. 3r31247.  3r31259. 3r31223. Conclusion
3r31247.  3r31259. In this article, we developed a model for grading images. For this purpose, we collected a data set: extracted video frames and manually divided them into three categories. Then the data was augmented by adding images using 3r31227. imgaug
. 3r31247.  3r31259. 3r31247.  3r31259. After that, we explained what learning transfer is and used the MobileNet trained model from the 3r31233 package for our own purposes. @ tensorflow-models /mobilenet
. We loaded MobileNet from a file in the Node.js process and trained an extra dense layer where data was fed from the hidden MobileNet layer. After training, we have reached an accuracy of more than 90%! 3r31247.  3r31259. 3r31247.  3r31259. To use this model in the browser, we downloaded it together with MobileNet and ran a frame categorization from the user's webcam every 100 ms. We connected the model with the game MK.js and used the output of the model to control one of the characters. 3r31247.  3r31259. 3r31247.  3r31259. Finally, we looked at how to improve the model by combining it with a recurrent neural network to recognize actions. 3r31247.  3r31259. 3r31247.  3r31259. I hope you enjoyed this tiny project as much as I did! ‍
3r31259. 3r31259. 3r31252. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r31253. 3r31259. 3r31255. 3r31259. 3r31259. 3r31259. 3r31259.

It may be interesting

  • Comments
  • About article
  • Similar news
This publication has no comments.

weber

Author

28-10-2018, 22:37

Publication Date

Game development / Image processing / Machine learning

Category
  • Comments: 0
  • Views: 377
Material design: Shape - tips for
Detection of sarcasm using
Machine vision for retail. How to read
Improving the quality of images using a
Image processing: Tensorflow Object
Hessian-Free optimization with
Write a comment
Name:*
E-Mail:


Comments
Global Dezigns is a Website Development Company in Karachi, Providing services of
website design in karachi
. We are delivering the best partnership across Pakistan. provides a complete range of web development services including web applications, website hosting and maintenance, domain registration, on-page search engine optimization, and website integration with social media platforms such as Facebook, Twitter, LinkedIn, Google Maps, and Google Local Directory. We believe we are well placed to take our knowledge and expertise to the logical next level with the latest web standards.  
  Show/hide text
https://www.globaldezigns.com/



Yesterday, 22:45

mike tomlin

This blog is really great. The information here will surely be of some help to me. Thanks!.mastering physics

Yesterday, 17:57

raymond weber

Coinyspace is the cryptocurrency community and trading forum where members can find any contributors of crypto ecosystem like currencies, exchanges & merchants. Check Out: Bitcoin Merchants
Yesterday, 16:57

noorseo

This is a great high resolution screen which you have shared for the users. Making a website is not an easy task but managing a good website is really a hard work. As far as this website is concerned, I am very happy.https://19216801.1
Yesterday, 16:01

nushra45

i'm satisfied to deem this make recognized Very beneficial for me, as it contains lot of advice. I usually favor to buttonhole The excellent and glad i found this count number in you assertion. thanks Subliminal Hypnosis
Yesterday, 14:56

jahangirkhatri

Adv
Website for web developers. New scripts, best ideas, programming tips. How to write a script for you here, we have a lot of information about various programming languages. You are a webmaster or a beginner programmer, it does not matter, useful articles will help to make your favorite business faster.

Login

Registration Forgot password