Towards Music-Aware Virtual Assistants

Alexander Wang, David Lindlbauer, Chris Donahue.

Published at ACM UIST 2024

Abstract

We propose a system for broadcasting speech notifications to music listeners in a manner that listeners find less intrusive and more delightful than existing conventions. Speech notifications provide convenient access to rich information without the need for a screen, playing a key role in virtual assistants that support users by e.g.,~dictating text notifications, or providing directions for navigation. Virtual assistants see prevalent use in hands-free settings such as driving or exercising, activities where users typically also enjoy listening to music. When broadcasting speech notifications to users, virtual assistants will typically mute a user's music to improve intelligibility. However, users may perceive these interruptions as intrusive, negatively impacting their music-listening experience. To address this challenge, we propose the concept of music-aware virtual assistants, where speech notifications are conveyed via singing them in harmony with a song that a user is listening to. We contribute a system that processes user music and notification text to produce a blended mix, replacing original song lyrics with the notification content. In a user study comparing musical assistants to standard virtual assistants, participants expressed a preference for our musical approach due to its better alignment with music, reduced intrusiveness, and more delightful user experience.

More information

Audio samples are available at https://drive.google.com/drive/folders/1xIwKG3nBfc5bRkfJ7YgcWn4r_NkgqB4f?usp=sharing

Materials

Bibtex

@inproceedings {Wang2024SingingAssistants, 
 author = {Wang, Alexander and Lindlbauer, David and Donahue, Chris}, 
 title = {Towards Music-Aware Virtual Assistants}, 
 year = {2024}, 
 publisher = {Association for Computing Machinery}, 
 address = {New York, NY, USA}, 
 doi = {10.1145/3654777.3676416}, 
 keywords = {Audio, Music, Virtual Assistants, Notification, Interruptions, Speech, Machine Learning}, 
 location = {Pittsburgh, PA, USA}, 
 series = {UIST '24} 
 }