Meta shows off metaverse facial scans

Colin Campbell, Tuesday, October 10th, 2023 7:24 am

Meta and Mark Zuckerberg’s big gamble on the metaverse incorporates scanned and detailed versions of our actual selves, that we can take into virtual environments to interact with others.

This week – and just a few days after the unveiling of Meta’s new Quest 3 headset – Zuckerberg demonstrated how far the company has come in terms of “codec scans” that deliver detailed facial animations in real time, for the purposes of conversation.

He appeared via VR in an hour-long video podcast conversation with Lex Fridman as both a demonstration and a discussion about the technology.

As Zuckerberg explains, the avatars we currently use in games and in virtual reality are often cartoony, demonstrating a limited range of non-personal, pre-rendered emotional responses. Codec scans are far more detailed, rendering a version of the self that, while clearly computer-generated, is satisfyingly realistic.

For Zuckerberg, codec scanning offers a low latency alternative to video conferencing. Although the scanning process is currently laborious, once it’s done, the amount of processing required is minimal. And although the technology is far from the finished article – currently it only uses faces, and limited bodies, and it still feels a little plasticky – the potential for usage inside games, work, education, and social activities are obvious.

Thinking and ambition

Zuckerberg’s interview is available on YouTube – a 2D video can only do so much to render a realistic-feeling 3D experience. But it does a good job of demonstrating Meta’s thinking and ambition on this issue.

“The idea is that – instead of our avatars being cartoony, and instead of actually transmitting a video – we’ve scanned ourselves and a lot of different expressions,” Zuckerberg told Fridman. “We’ve built a computer model of each of our faces and bodies and the different expressions that we make.

“We collapse that into a codec. When you have the headset on your head, it sees your face and sees your expression. And it can basically send an encoded version of what you’re supposed to look like over the wire. So in addition to being photorealistic, it’s also more bandwidth efficient than transmitting a full video or especially a 3D immersive video of a whole scene like this.”

Emotional gravity

The challenge is to capture a human being’s emotional gravity, generally through facial expressions and most especially the eyes. “There’s obviously a certain realism that comes with delivering this photorealistic experience,” he said. “It’s really magical and it gets to the core of the vision around virtual and augmented reality – delivering a sense of presence as if you’re there together no matter where you actually are in the world.”

Before we can all get our hands on this technology, various challenges need to be overcome, most especially in streamlining the actual scanning process. Zuckerberg and Fridman attended a scanning facility in order to facilitate their interview.

“We haven’t figured out how much we can reduce that down to a really streamlined process,” he said. “But the goal – and we have a project that’s working on this already – is just to do a very quick scan with your cell phone.”

He wants the process to become “where you just take your phone, wave it in front of your face for a couple of minutes, say a few sentences, make a bunch of expressions … so the whole process is just two to three minutes. That then produces something of the quality we have right now.”

Uncanny valley

Fridman marveled that the codec scans managed to avoid the dreaded uncanny valley, but Zuckerberg acknowledged the difficulty in creating scans that work seamlessly across millions of different human beings.

“There’s still a bunch of tuning,” he said. “Different people emote to different extents. When you smile, how wide is your smile? How wide do you want your smile to be? Getting that tuned on a per-person basis is going to be one of the things that we need to figure out.”

Many people are going to want a personalized version of their actual person – one that improves upon reality or that’s more attuned to virtual environments.

Zuckerberg commented: “I always get a lot of critique and shit for having a relatively stiff expression. But you know, I might feel happy, but I just make a small smile. Maybe, for me, I’d want to have my avatar really be able to better express how I’m feeling than what I actually do physically.”

The podcast takes place in a dark room that highlights the avatars, but in real applications, surrounding and activities are going to be key. Zuckerberg says it’s fine for people who want to use scans for video calls, but he has other ambitions.

“The things that you can do in the metaverse are different from what you can do on a phone – doing stuff where you [feel like] you’re physically there together and participating in things together. We could play games like this,” he said, although he acknowledged that many games have aesthetic parameters that would not be a good fit for codec scans.

“Once you get mixed reality and augmented reality, we could have codec avatars and go into a meeting and have some people physically there and have some people show up in this photorealistic form superimposed on the physical environment. Stuff like that is going to be super powerful.”

In terms of expansion, Meta will “probably roll this out progressively over time”, so that eventually “everyone has a codec avatar,” he said. “We want to get more people scanned into the system. And then we want to start integrating it into each one of our apps. I think that this is going to make a ton of sense.

“We get a lot of feedback where people are pretty blown away by the experience. Something like this could make a big difference for those remote meetings… It’s not ready to be like a kind of mainstream product yet, but we’ll keep tuning it, keep getting more scans in there, and rolling it out into more features. Definitely in the next few years we’ll be seeing a bunch more experiences like this.”

Colin Campbell

Colin Campbell has been reporting on the gaming industry for more than three decades, including for Polygon, IGN, The Guardian, Next Generation, and The Economist. © 2024 | All Rights Reserved.