Crafting Multi-Modal Interactions on Voice Assistants

Authors

  • Ashlesha Vishnu Kadam Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA Author

DOI:

https://doi.org/10.47363/JEAST/2022(4)188

Keywords:

AI, ASR, Human Computer Interaction, NLU, Multi Modal, Voice Assistant

Abstract

Voice assistants are finding adoption because of their ease and intuitiveness of use. While voice has been the dominant mode of interaction of humans with voice assistants, some embodiments of voice assistants also provide alternative modalities for interaction, popularly, via visual or touch interface. In this paper, the end-to-end working of a multi-modal voice assistant is provided, followed by a deep dive into the challenges associated specifically with multimodal voice assistants. This is followed by mitigation strategies for the challenges arising out of multimodal interaction scenarios. Next, architectural and design guidelines are provided that can provide a seamless user experience. Finally, future research areas have been identified.

Author Biography

  • Ashlesha Vishnu Kadam, Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA

    Ashlesha Vishnu Kadam, Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA

Downloads

Published

2022-12-20