Phone Pixel 3 is learning to determine the depth in the photos

 3r3161. 3r3-31. Portrait mode on Pixel smartphones allows you to take professional-looking photos that draw attention to the subject with a blurred background. Last year, we described how we calculate the depth using a single camera and Phase-Detection Autofocus (PDAF), also known as autofocus with double pixels 3r3149. . This process used
Traditional stereo algorithm 3r3149. without training. This year on Pixel ? we adopted machine learning to improve the depth estimate and to produce even better results in portrait mode. 3r3142.  3r3161. 3r3142.  3r3161. Phone Pixel 3 is learning to determine the depth in the photos Left: The original image, taken in 3r314. HDR +
. To the right is a comparison of shooting results in portrait mode using depth from traditional stereo and machine learning. Learning outcomes result in fewer errors. In the traditional stereo result, the depth of many horizontal lines behind a man is incorrectly estimated to be equal to the depth of the man himself, as a result of which they remain sharp. [/i] 3r3142.  3r3161. last year we described that portrait mode uses a neural network to separate pixels belonging to people and background images, and complements this two-level mask with depth information obtained from PDAF pixels. All this was done to get a blur, depending on the depth close to what a professional camera can give. 3r3142.  3r3161. 3r3142.  3r3161. For PDAF, it takes two slightly different shots of the scene. Switching between shots, you can see that the person does not move, and the background shifts horizontally - this effect is called 3r333. parallax
. Since parallax is a function of the distance of a point from the camera and the distance between two points of view, we can determine the depth by comparing each point in one picture with its corresponding point in another. 3r3142.  3r3161. 3r3142.  3r3161. The PDAF images on the left and in the middle look similar, but the parallax can be seen on the right in an enlarged fragment. The easiest way to notice it is on the circular structure in the center of the increase. [/i] 3r3142.  3r3161. 3r3142.  3r3161. However, the search for such correspondences in PDAF images (this method is called stereo depth) is an extremely difficult task, since the points between photos are shifted very little. Moreover, all stereo technologies suffer from aperture problems. If you look at the scene through a small aperture, it will be impossible to find the correspondence of points for lines parallel to the stereo baseline, that is, the line connecting the two cameras. In other words, studying in the presented photo horizontal lines (or vertical lines in pictures with portrait orientation) all the shifts in one image relative to another look approximately the same. In last year's portrait mode, all these factors could lead to errors in determining the depth and the appearance of unpleasant artifacts. 3r3142.  3r3161. 3r3142.  3r3161. 3r33140. Improving the depth estimate of 3r3141. 3r3142.  3r3161. With the portrait mode in Pixel ? we correct these errors, using the fact that the parallax of stereo photographs is just one of many clues present in the images. For example, points that are far from the focus plane seem less sharp, and this will be a hint from the defocused depth. In addition, even when viewing an image on a flat screen, we can easily estimate the distance to objects, because we know the approximate size of everyday objects (that is, we can use the number of pixels depicting a person’s face to estimate how far he is). This will be a semantic hint. 3r3142.  3r3161. 3r3142.  3r3161. Manually developing an algorithm that combines these hints is extremely difficult, but using MO, we can do this while improving the performance of the hints from the parallax PDAF. Specifically, we train 3r360. 3r3149 convolutional neural network. written in 3r362. TensorFlow
taking in the input pixels from the PDAF, and learning to predict the depth. This new, improved depth estimation method based on MO is used in Pixel 3 portrait mode.
 3r3161. 3r3142.  3r3161. Our convolutional neural network accepts a PDAF image as input and produces a depth map. The network uses a decoder-style architecture with additional links inside the[skip connections]layer. and residual[residual blocks]blocks. . [/i] 3r3142.  3r3161. 3r3142.  3r3161. 3r33140. Neural Network Training
3r3142.  3r3161. To train the network, we need a lot of PDAF images and the corresponding high-quality depth maps. And since we need the depth prediction to be useful in portrait mode, we need the training data to be similar to the photos that users take from smartphones. 3r3142.  3r3161. 3r3142.  3r3161. To do this, we designed a special device “Frankenfon”, in which five Pixel 3 phones were combined and a WiFi connection was established between them, which allowed us to take photos from all phones at the same time (with a difference of no more than 2 ms). With this device, we calculated high-quality depth maps based on photos, using both motion and stereo from several points of view. 3r3142.  3r3161. 3r3142.  3r3161. Left: device for collecting training data. In the middle: an example of switching between five photos. Synchronization of cameras ensures the ability to calculate the depth in dynamic scenes. Right: Total Depth. Low confidence points where pixel matching in different photos was uncertain due to the weakness of the textures are colored black, and are not used in training. [/i] 3r3142.  3r3161. 3r3142.  3r3161. The data obtained with this device turned out to be ideal for network training for the following reasons: 3r3142.  3r3161. 3r3104.  3r3161. 3r3118. Five points of view guarantee parallax in several directions, which saves us from the problem of aperture. 3r3119.  3r3161. 3r3118. The location of the cameras ensures that any point of the image is repeated on at least two photographs, which reduces the number of points that cannot be matched. 3r3119.  3r3161. 3r3118. The baseline, that is, the distance between the cameras, is larger than that of the PDAF, which guarantees a more accurate depth estimate. 3r3119.  3r3161. 3r3118. Synchronization of cameras ensures the ability to calculate the depth in dynamic scenes. 3r3119.  3r3161. 3r3118. The portability of the device ensures the possibility of taking photographs in nature, simulating photos that users take with the help of smartphones. 3r3119.  3r3161.
3r3142.  3r3161. 3r3142.  3r3161. However, despite the ideality of data obtained using this device, it is still extremely difficult to predict the absolute depth of scene objects — any given PDAF pair can correspond to various depth maps (it all depends on the characteristics of the lenses, focal length, etc.). To take all this into account, we estimate the relative depth of the objects in the scene, which is enough to obtain satisfactory results in portrait mode. 3r3142.  3r3161. 3r3142.  3r3161. 3r33140. Combine all this
3r3142.  3r3161. Estimation of depth using MO on Pixel 3 should work quickly so that users do not have to wait too long for the results of images in portrait mode. However, to get good depth estimates using small defocus and parallax, you have to feed neural network photos in full resolution. To ensure quick results, we use TensorFlow Lite , a cross-platform solution for running MO-models on mobile and embedded devices, as well as a powerful Pixel 3 GPU, which allows you to quickly calculate the depth for unusually large input data. Then we combine the obtained depth estimates with masks from our neural network, highlighting people, in order to get beautiful shooting results in portrait mode. 3r3142.  3r3161. 3r3142.  3r3161. 3r33140. Try it yourself 3r3141. 3r3142.  3r3161. In the Google Camera App versions 6.1 and above, our depth maps are embedded in portrait mode images. This means that we can use Google Photos Depth Editor to change the degree of blur and focus point after having already taken the picture. You can also use
third-party 3r3149. programs to extract depth maps from jpeg, and study them yourself. Also 3r3148. under the link 3r3149. You can take an album showing relative depth maps and corresponding images in portrait mode, for comparing the traditional stereo and MO-approach.
3r3161. 3r3161. 3r3161. 3r3154. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//"""_mediator") () ();
3r3161. 3r3161. 3r3161. 3r3161.
+ 0 -

Comments 2

Free stock footage
Free stock footage 18 July 2019 15:35
As an honest experience, I was searing the internet for countless hours for stock footage of Indonesia for a school project I'm currently doing. Free stock footage I came across Nature Stock Videos from scrolling pretty far in a search engine and I thought I would give it a try.
fuzail faisal
fuzail faisal 15 August 2019 13:21
As an honest experience,
 I was searing the internet for countless hours for stock footage of Indonesia for a school project I'm currently doing. I came across Nature Stock Videos from scrolling pretty far in a search engine and I thought I would give it a try.

Add comment