Machine Learning Behind Your Favorite Google Meet Backgrounds – Analytics India Magazine – AINewZine.com – Tech News on AI and ML (Artificial Intelligence News)

The pandemic forced the corporate world to abandon their offices and work from home. As a matter of fact, the in-person meetings were replaced by virtual meetups. Thanks to this new tradition, video conferencing companies like Zoom benefited tremendously. The sudden rush to embrace virtual worlds led to various challenges. Users were annoyed by interruptions, noise, monotonous walls and more. So, companies that offer video call services allowed users to change their background and made changes so that the noise in audio is reduced.

These tweaks are usually the result of machine learning models running in the background. Loading these models and running them for inference can be slow. So, models need to be small along with the imagery. More so, if you are launching these services on your browser. Google Workspace (formerly G-Suite) has done well in this regard by making the voice calls more clear and the backgrounds more aesthetic for its users. In a recent blog post, Google discussed how they had achieved such high quality for their video services.

The engineers at Google diligently crafted a pipeline that leverages many ML innovations that Google has developed over the years. One of them is MediaPipe, an open-source, cross-platform framework for building pipelines to process perceptual data of different modalities. More about it in the next section.

About MediaPipe

Most of the object detection usually addresses two dimensional or 2D objects. The bounding boxes are always rectangles and squares but never a cube. By extending prediction to 3D, one can capture an object’s size, position and orientation in the world, leading to a variety of applications in robotics, self-driving vehicles, image retrieval, and augmented reality.

Google AI released MediaPipe Objectron, a mobile real-time 3D object detection pipeline for everyday objects. This pipeline detects objects in 2D images, and estimates their poses and sizes through a machine learning (ML) model, trained on a newly created 3D dataset. Objectron computes oriented 3D bounding boxes of objects in real-time on mobile devices.

Using MediaPipe, Google introduced a new in-browser ML solution for blurring and background replacement in Google Meet. With the help of MediaPipe, ML models and OpenGL shaders run efficiently on the browser. Google claims that it has achieved real-time performance with low power consumption, even on low-power devices.

How Did Google ‘Meet’ The Challenge

Source: Google AI

“…other solutions require installing additional software, Meet’s features are powered by cutting-edge web ML technologies built with MediaPipe that work directly in your browser — no extra steps necessary.”

To provide real-time, in-browser performance, Google combined efficient on-device ML models, WebGL-based rendering, and web-based ML inference via XNNPACK and TensorFlow Lite.

MediaPipe leverages WebAssembly, a low-level binary code format designed specifically for web browsers. This helps improve speed for compute-heavy tasks. During a video call, the browser converts WebAssembly instructions into native machine code that executes much faster than traditional JavaScript code.

The procedure can be summarised as follows:

Machine Learning Behind Your Favorite Google Meet Backgrounds – Analytics India Magazine

Lauren