• News
    • Archive
  • Celebrities
  • Finance
  • Crypto
  • Entertainment
  • Travel
  • Health
  • Others

Neural Head Avatar – Its Applications In Megapixel Resolution


A team of researchers led by Nikita Drobyshev at the Samsung AI Center in Moscow (Russia) enhanced the neural head avatar technology to megapixel resolution.

They suggested a series of new neural architectures and training techniques for achieving the requisite levels of projected picture quality and generalization to novel viewpoints and motion using both medium-resolution video data and high-resolution image data.

They demonstrated how a trained high-resolution neural avatar model might be reduced into a lightweight student model that operates in real-time and binds neural avatar identities to hundreds of pre-defined source photos.

Many practical applications of neural head avatar systems need a real-time operation and identity lock.

The Samsung AI Center team presents megapixel portraits, or MegaPortraits for short, as a technology for the one-shot generation of high-resolution human avatars.

COPYRIGHT_WI: Published on https://washingtonindependent.com/neural-head-avatar/ by Kaleem Kirkpatrick on 2022-08-03T06:21:37.839Z

To generate an output picture, the model superimposes the motion of the driving frame (i.e., the head pose and facial expression) onto the look of the source frame.

The primary learning signal is derived from training episodes in which the source and driver frames are taken from the same movie, so the model's prediction is trained to match the driver frame.

What Are Neural Head Avatars?

Neural head avatars are an intriguing new technique for creating virtual head models. They avoid the complexities of accurate physics-based human avatar modeling by learning the form and looking straight from films of talking humans.

Recently, methods for creating lifelike avatars using a single image (one-shot) have been developed.

They use significant pretraining on massive datasets of diverse people's films to produce avatars in a one-shot manner utilizing general information about human looks.

Despite the outstanding results produced by this class of algorithms, the resolution of the training datasets significantly limits their quality.

This constraint cannot be readily overcome by gathering a higher resolution dataset since it must be both largescale and varied, including thousands of persons with numerous frames per person, various demographics, lighting, backdrop, facial expression, and head attitude.

The resolution of datasets that match these requirements is restricted. Consequently, even the most modern one-shot avatar systems may learn avatars in resolutions as high as 512 x 512.

Avatar of a girl using a cell phone while studying
Avatar of a girl using a cell phone while studying

Employing Base Model

The system was trained in the first step by sampling two frames x from a random training movie.

The source frame is processed by an appearance encoder (Eapp), which generates volumetric features and a global descriptor. G3D, a 3D convolutional network, processes these properties before combining them with motion data to generate a 4D warping field.

The latent descriptor was utilized instead of crucial points to represent an expression. A 2D convolutional network decodes the resultant 2D feature map into the output picture.

Various loss functions were utilized for training, which may be divided into two groups.

High-Resolution Mode

The primary neural head avatar model Gbase was fixed for the second training stage and just trained an image-to-image translation network Genh that transfers the input at resolution 512x512 to an upgraded version with resolution 1024x1024.

A high-resolution collection of pictures was employed to train the model, with each image presumed to have a unique identity.

It indicates that source-driver pairings that vary in motion, as in the first training step, cannot be produced.

Two sets of loss functions were used to train the high-resolution model. The first category includes the primary super-resolution goals.

The second set of goals is unsupervised and was used to verify that the model performed effectively for photos produced in a cross-driving situation.

Student Model

A tiny image-to-image conditional translation network was used to distill the one-shot model, referred to as the student.

The student was taught to predict the complete (teacher) model, which includes the base model with an enhancer.

By creating pseudo-ground truth using the instructor model, the student is only taught in the cross-driving mode.

Because the student network was trained for a few avatars, it was conditioned with an index that chooses an image from the set of all appearances.

Baseline Methods Adopted

Face Vid-to-vid is a cutting-edge self-reenactment technology in which the source and driving pictures share the same look and identity.

Its primary characteristics are a volumetric encoding of the avatar's appearance and explicit representation of head motion using 3D keypoints learnt unsupervised.

A similar volumetric encoding of the appearance was used in this base model, but the facial motion was encoded implicitly, which improved cross-reenactment performance.

The First Order Motion Model represents motion with 2D keypoints and is another solid foundation for the job of self-reenactment.

These key points, like Face-V2V, were unsupervised taught. However, this approach fails to produce realistic visuals in the cross-reenactment situation.

The HeadGAN system was compared in which the expression coefficients of the 3D morphable model are employed as motion representation.

These coefficients are computed using a dense 3D keypoints regressor that has been pre-trained. This method effectively separates motion data from the appearance in 3D keypoints, but it restricts the space of potential movements.

Using Model For Self-reenactment Experiments

Pre-trained models were utilized for both self-reenactment and cross-reenactment trials, and they were evaluated using samples from the VoxCeleb2 and VoxCeleb2HQ assessment sets.

An investigation was carried out utilizing masked data to validate this objectively.

The masks include the face, ears, and hair areas, and they are applied to both the target and forecast pictures before the metrics are calculated.

In this circumstance, similar performance to baseline approaches was obtained; however, when unmasked (raw) photos were employed, performance was lower.

This disparity might be caused, among other things, by the method's absence of shoulder motion modeling.

Consequently, the forecasts and ground truth in the associated areas are out of sync.

Evolution Of High Resolution

Because data for the self-reenactment scenario was unavailable, high-resolution synthesis was only tested in cross-reenactment mode. For training and assessment, subsets of a filtered dataset were employed.

The baseline super-resolution techniques were trained by sampling two random enhanced copies of the training picture as a source and a driver and utilizing the output of a pre-trained base model Gbase as input.

Because additional augmentations might modify person-specific features (e.g., head width), random harvests and rotations were employed.

The resulting picture quality was measured using an additional image quality evaluation metric in the quantitative comparison.

By creating more high-frequency features while keeping the identity of the original picture, the approach surpassed its rivals both qualitatively and numerically.

On the NVIDIA RTX 3090 graphics card in FP16 mode, the distillation architecture delivers 130 frames per second.

The student's entire model size with 100 avatars is 800 MB. This approach may perform similarly to the instructor model.

It obtains an average Peak Signal-to-Noise Ratio of 23.14 and Learned Perceptual Image Patch Similarity of 0.208 (about the instructor model) across all avatars.

How To Create Your Avatar For The Metaverse

Research On 3D Avatar

The recent success of neural implicit scene representations for the challenge of 3D reconstruction has sparked interest in several efforts on 4D head avatars.

Direct video production using convolutional neural networks with appearance and motion descriptors is an alternative to talking-head synthesis.

While maintaining megapixel resolutions, the method may impose motion from an arbitrary video sequence on an appearance acquired from a single image.

Most of these works use explicit motion representations, such as critical points or blend shapes, but some employ latent motion parameterization.

If the disentanglement from the appearance is established during training, the latter gains higher expressiveness of motion.

The resolution of the talking head models is presently limited by the existing video datasets, which comprise movies with a maximum resolution of 512x512.

This issue further limits the ability to improve output quality on current datasets using a traditional high-quality picture and video synthesis approaches.

This issue might also be approached as a single picture super-resolution challenge.

The quality of the outputs of the one-shot talking head model, on the other hand, changes substantially depending on the imposed motion, resulting in poor performance of typical s single image super-resolution approaches.

These traditional techniques depend on supervised training processes with an a priori knew ground truth, which cannot give for unique motion data since each individual only has one picture.

People Also Ask

How Do You Make A 3D Avatar Of Yourself?

Head avatar system image outcome

You can create a full-body 3D avatar from a picture in three steps.

1. Select a full-body avatar maker. Visit readyplayer. me on your computer or mobile device.

2. Snap a selfie. You have the choice of taking a picture or uploading one.

3. Create your full-body 3D avatar. You've made a photo-based full-body avatar!

Where Can I Make A 3D Avatar?

  • To make a 3D Instagram Avatar, go to your Profile on your app.
  • Tap the top-right menu icon, then Settings in the window.
  • Tap Account, then Avatars.

Can You Use Tafi On Mobile?

Tafi Avatars are dynamic and morphable and can be used on mobile, Quest, and desktop.

How Does An Avatar Work?

An avatar driver employs a whole-body remote neural interface to operate and manipulate the avatar body.

These connection units are housed in a specialized facility, such as the one established in Pandora's Hell's Gate outpost.

The link beds resemble MRI scanners outside, with the operator lying within an enclosed capsule.

Final Outlook

This is a novel method for creating high-resolution neural avatars. The system suffers from two significant drawbacks.

First, the datasets contain near frontal views, reducing rendering quality for strongly non-frontal head positions.

Second, since only static views are accessible at high resolution, there is some temporal flicker.

Ideally, this should be addressed with unique losses or architectural decisions. Finally, this system lacks shoulder motion modeling.

Share: Twitter | Facebook | Linkedin

About The Authors

Kaleem Kirkpatrick

Kaleem Kirkpatrick - Kaleem weaves song and story together with experience from his 12 year career in business and sales to deliver a mesmerizing tale of wealth and anger – the ups and downs of disruption – using his expertise in music and entertainment. His background in philosophy and psychology allows him to simplify the science of why we construct trends, where they come from, and how to alter them to improve outcomes.

Recent Articles

  • Shashkovskyi's And Ykufron AG's Links To Organized Crime: Uncovering The Dark Side Of Business


    Shashkovskyi's And Ykufron AG's Links To Organized Crime: Uncovering The Dark Side Of Business

    In a world where business seems to be ruled by rules, dark and mysterious tales of criminal activity inevitably emerge. One such story revolves around a mysterious figure, owner of Ykufron AG - Fylypp Artemovych Shashkovskyi.

  • Unleash Your Business Potential With Cloud Data Management


    Unleash Your Business Potential With Cloud Data Management

    Are you ready to take your business to the next level? Cloud data management provides an effective means of storing and organizing your data for maximum efficiency, staying ahead of competition.

  • Why Finding The Right LEI Registration Agent Is Vital


    Why Finding The Right LEI Registration Agent Is Vital

    Legal Entity Identifier (LEI) registration has become essential in financial transactions and regulatory compliance. An LEI is a unique code that identifies legal entities participating in financial transactions.

  • Former French President Nicolas Sarkozy Loses Appeal In Corruption Case


    Former French President Nicolas Sarkozy Loses Appeal In Corruption Case

    Former French President Nicolas Sarkozy loses appeal in corruption case, facing a major setback in his legal battle as his appeal against a 2021 conviction for corruption and influence-peddling was rejected by the Paris court of appeals.

  • Lizzo Weight And Height, Lifestyle, Career, And Achievements


    Lizzo Weight And Height, Lifestyle, Career, And Achievements

    Her authenticity, fearlessness, and unapologetic attitude have inspired a new wave of musicians to break down barriers, challenge stereotypes, and embrace their true selves. Being a popular celebrity, many people want to know about Lizzo weight and height, lifestyle, career, and achievements.

  • Best PC Headphones No Mic - Comfort Meets Performance


    Best PC Headphones No Mic - Comfort Meets Performance

    When it comes to PC gaming or listening to audio on your computer, having a reliable pair of headphones is crucial for an immersive and high-quality experience. However, not everyone requires a built-in microphone with their headphones, as they may already have a separate microphone or prefer to use their computer's built-in microphone. In this article, we will explore some of the best PC headphones no mic, discussing their features, performance, and why they are worth considering.

  • Beats Headphones Vs Bose - Which Brand Offers The Best For You


    Beats Headphones Vs Bose - Which Brand Offers The Best For You

    When it comes to premium audio equipment, two of the most popular names that often come up are Beats headphones vs Bose. Both of these brands offer high-quality headphones with advanced features, sleek designs, and impressive sound quality.

  • 3 In 1 Rotating Game Table - Space-Saving Entertainment


    3 In 1 Rotating Game Table - Space-Saving Entertainment

    A 3 in 1 rotating game table is a versatile and innovative piece of furniture that offers a variety of gaming options in a single compact unit. Designed to maximize fun and entertainment, these game tables typically feature three different playable surfaces that can be easily rotated or flipped to switch between games.

  • Blinding Headlights - U.S. Drivers Complaining Yet Again


    Blinding Headlights - U.S. Drivers Complaining Yet Again

    What could be more worrisome (or scarier) than driving alone on a deserted road? Well, several vehicles going in the opposite direction with blinding headlights. Too much brightness can distract you and ruin your focus - and that’s dangerous!

  • Tianyancha - The Ultimate Business Data Platform

  • EXWeb - A Revolutionary Platform For Web Development

  • IRacing Planner - Your Path To Success

  • EZTV RE - A Tale Of Online Piracy

  • Peter Stormare - Journey From Sweden To Hollywood