To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

ACM International Conference on Multimodal Interaction (ICMI)


Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar’s speech and their body pose, but it also needs to model interpersonal dynamics with the interlocutor present in the conversation. In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar. We evaluate our proposed model on dyadic conversational data consisting of pose and audio of both participants, confirming the importance of adaptive attention between monadic and dyadic dynamics when predicting avatar pose. We also conduct a user study to analyze judgments of human observers. Our results confirm that the generated body pose is more natural, models intrapersonal dynamics and interpersonal dynamics better than non-adaptive monadic/dyadic models.


The video has 2 characters, the one on the right is the avatar whose pose is predicted by our proposed model DRAM (Dyadic Residual Attention Model). The character on the left is the interlocutor.

Related Publications

All Publications

CVPR - June 18, 2021

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

CVPR - June 18, 2021

Discovering Relationships between Object Categories via Universal Canonical Maps

Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, Andrea Vedaldi

CVPR - June 17, 2021

Connecting What to Say With Where to Look by Modeling Human Attention Traces

Zihang Meng, Licheng Yu, Ning Zhang, Tamara Berg, Babak Damavandi, Vikas Singh, Amy Bearman

DSN - June 21, 2021

Near-Realtime Server Reboot Monitoring and Root Cause Analysis in a Large-Scale System

Fred Lin, Bhargav Bolla, Eric Pinkham, Neil Kodner, Daniel Moore, Amol Desai, Sriram Sankar

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy