Visual captions: Using large language models to augment video conferences with dynamic visuals
Google Research AI blog
JUNE 6, 2023
In “ Visual Captions: Augmenting Verbal Communication With On-the-fly Visuals ”, presented at ACM CHI 2023 , we introduce a system that uses verbal cues to augment synchronous video communication with real-time visuals. The system is even robust against typical mistakes that may often appear in real-time speech-to-text transcription.
Let's personalize your content